Closed vince62s closed 1 month ago
I think it should be self_attn_backend, and should not be a model setting but a training setting on top of an inference setting
values should be: "flash2", "pytorch" "pytorch" would include the sdpa_kernels used here: https://github.com/eole-nlp/eole/blob/main/eole/modules/multi_headed_attn.py#L637
we could test if flash2 is installed at training/inference start and adjust backend if necessary.
thus we could remove the flash2 setting from the MHA (redundant with self_attn_backend)
I think it should be self_attn_backend, and should not be a model setting but a training setting on top of an inference setting
values should be: "flash2", "pytorch" "pytorch" would include the sdpa_kernels used here: https://github.com/eole-nlp/eole/blob/main/eole/modules/multi_headed_attn.py#L637
we could test if flash2 is installed at training/inference start and adjust backend if necessary.
thus we could remove the flash2 setting from the MHA (redundant with self_attn_backend)