cognitivecomputations / laserRMT

This is our own implementation of 'Layer Selective Rank Reduction'
Apache License 2.0
232 stars 27 forks source link

use_flash_attn=False in calculate_model_perplexity #6

Closed l4b4r4b4b4 closed 9 months ago

l4b4r4b4b4 commented 9 months ago

Are the dummy flags for flash_attn and cuda graph legacy or could they be useful? ;)

fernando-neto-ai commented 9 months ago

Feel free to clean up. They were just legacy. I'd be glad to count on you