Closed janelu9 closed 10 months ago
It flats the batch_size dim to the seq_len dim
Take a look at flash_attn_varlen_func. This is what xformers calls I think.
Take a look at flash_attn_varlen_func. This is what xformers calls I think.
thanks, I think so too.