bigcode-project / Megatron-LM

Ongoing research training transformer models at scale
Other
371 stars 48 forks source link

assert Flash Attention doesn't get arbitrary mask #53

Closed mayank31398 closed 1 year ago

mayank31398 commented 1 year ago

Since FlashAttention only works with no mask or causal mask, its better to throw an error here.

janEbert commented 1 year ago

You also mentioned --reset-position-ids being a problem, does this also need to be handled?

mayank31398 commented 1 year ago

I don't think that needs to be handled. It should work with any position ids.