Open AlpinDale opened 2 months ago
There are no limitations from the side ofsigmoid attention, but we don't currently have any plans to implement these two functions unfortunately.
Thanks for the response. I'll try and study the codebase. Would you accept PRs to support those features?
We don’t want to delay your experiments, please go ahead and fork the project.
Hello, and thanks for the great work. Any reason why
flash_attn_varlen_func
andflash_attn_with_kvcache
is not supported? If it's not an inherit limitation of sigmoid attention, are there any plans? Would love to help out if so, I can't use this in my project without those two functions :)