Open complexfilter opened 1 hour ago
Please try compiling with CUDA 12.3
Please try compiling with CUDA 12.3
I believe my cuda version is 12.4.
Fri Nov 15 16:50:13 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:86:00.0 Off | 0 |
| N/A 26C P0 80W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Not sure if CUDA 12.4 was the issue.
What matters is is the version of nvcc, not the CUDA driver. You can install cuda software toolkit (including nvcc) to whichever driver version
I found v2.6.3's
flash_attn_varlen_func
runs faster than v2.7.0.post2'sflash_Attn_varlen_func
on H100.code
Result from using v2.6.3 on H100:
Result from using v2.7.0.post2 on H100:
The runtime is 150ms vs 221ms.