About influenceof flash attention

Pointcept / PointTransformerV3

[CVPR'24 Oral] Official repository of Point Transformer V3 (PTv3)

MIT License

583 stars 30 forks source link

About influenceof flash attention #52

Open Time-Lord12th opened 1 month ago

Time-Lord12th commented 1 month ago

Dear Author,

Thank you for open-sourcing such a great piece of work. Could you please elaborate on the extent to which flash attention can bring speed and memory efficiency improvements to PTv3? Additionally, you mentioned that "FlashAttention force disables RPE and forces the accuracy reduced to fp16". Will the reduction in attention precision from fp32 to fp16 have a significant negative impact?

Thank you!

Gofinge commented 1 month ago

Typically not, but in very rare cases may cause NaN.