Use attention_mask and use_cache for xformers

DachengLi1 / LongChat

Official repository for LongChat and LongEval

Apache License 2.0

504 stars 29 forks source link

Use attention_mask and use_cache for xformers #22

Closed fahadh4ilyas closed 1 year ago

fahadh4ilyas commented 1 year ago

So, after checking that your monkey patch did not use attention_mask parameter in forward, I learned that there is a class named LowerTriangularMaskWithTensorBias that we could add attention_mask inside.

I also enabling use_cache because the script still using past_key_value and it even concatenated to current key and value tensor.