baidu-research / warp-ctc

Fast parallel CTC.
Apache License 2.0
4.06k stars 1.04k forks source link

Support latest cuda toolkits and fix a shfl.up.sync bug #178

Closed zlsh80826 closed 1 year ago

zlsh80826 commented 1 year ago
  1. Add sm_90 support for nvcc >= 11.8
  2. Remove sm_35 suport for nvcc >= 12.0
  3. The fourth argument for shfl.up.sync is membermask(doc), which is typical 0xFFFFFFFF instead of the value to be added. This bug will trigger illegal instructions after Hopper architecture and it is an incorrect usage though there are no alerts on the Volta/Ampere architecture.

The code from moderngpu was introduced through https://github.com/baidu-research/warp-ctc/pull/148, and there are updates after this pull request (commits), I recommend you to schedule an engineer to update the moderengpu code in case flaws that are not caught.