Open Hzfinfdu opened 3 weeks ago
The main bottleneck of SAE training lies in activation gen. It can be annoying when we try to work with larger models.
Try to accelerate TL inference, especially attn forward. What are some possible options? FlashAttn2 or VLLM or something?
Since we usually do not cache Q K V, attn forward can be replaced with some faster alternatives.
The main bottleneck of SAE training lies in activation gen. It can be annoying when we try to work with larger models.
Try to accelerate TL inference, especially attn forward. What are some possible options? FlashAttn2 or VLLM or something?
Since we usually do not cache Q K V, attn forward can be replaced with some faster alternatives.