66RING / tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass
217 stars 18 forks source link

Inquiry About CUTLASS Version in "standalone_src" #5

Closed HuangliangDai closed 5 months ago

HuangliangDai commented 5 months ago

Hi 66RING,

I am currently working with the standalone scripts and using the latest version of CUTLASS. However, I am encountering an issue related to "MNK permutation" during compilation. I suspect this might be caused by the version of CUTLASS I am using.

Could you please let me know which version of CUTLASS you used for the standalone scripts? Your guidance would be highly appreciated.

66RING commented 5 months ago

@HuangliangDai cutlass v3.3 looks good. And torch is 1.14, cuda is 12.4 if needed.

HuangliangDai commented 5 months ago

Really appreciate it. CUTLASS 3.5 works well on python binding version, but is incompatible with the standalone scripts. I'll try CUTLASS 3.3 instead. Thanks a million!

66RING commented 5 months ago

Really appreciate it. CUTLASS 3.5 works well on python binding version, but is incompatible with the standalone scripts. I'll try CUTLASS 3.3 instead. Thanks a million!

@HuangliangDai update cutlass API just now. and it should both work on cutlass v3.4.0 now

HuangliangDai commented 5 months ago

@66RING everything works well right now. very impressive work. thank you.