cognitivecomputations / laserRMT

This is our own implementation of 'Layer Selective Rank Reduction'
Apache License 2.0
229 stars 26 forks source link

use_flash_attn=False in calculate_model_perplexity #6

Closed l4b4r4b4b4 closed 8 months ago

l4b4r4b4b4 commented 8 months ago

Are the dummy flags for flash_attn and cuda graph legacy or could they be useful? ;)

fernando-neto-ai commented 8 months ago

Feel free to clean up. They were just legacy. I'd be glad to count on you