ROCm / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
4 stars 3 forks source link

__shfl_sync transformer inference workaround for ROCm #59

Closed rraminen closed 1 year ago

rraminen commented 1 year ago

Error:

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:195:48: error: use of undeclared identifier 'shfl_sync' auto q_rot_tmp = lane < half_dim ? shfl_sync(mask[lane], q_rot, lane + half_dim) ^

jithunnair-amd commented 1 year ago

Merging this PR, although there's a concern that the workaround might not be functionally correct. Will analyze correctness and update in a follow-up PR if required.