ROCm / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
4 stars 3 forks source link

Workarounds to enable DeepSpeed transformer inference on ROCm #58

Closed rraminen closed 1 year ago

rraminen commented 1 year ago

These changes are required to resolve the following errors when running bloom workload:

  1. /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:270:12: error: use of undeclared identifier '__double2half'; did you mean '__double2hiint'?
    return __double2half(val);
           ^~~~~~~~~~~~~
           __double2hiint
    /opt/rocm-5.4.0/include/hip/amd_detail/amd_device_functions.h:440:30: note: '__double2hiint' declared here
    __device__ static inline int __double2hiint(double x) {
                             ^
  2. 
    /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.hip:195:48: error: use of undeclared identifier '__shfl_sync'
            auto q_rot_tmp = lane < half_dim ? __shfl_sync(mask[lane], q_rot, lane + half_dim)

3. 

/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/reduction_utils_hip.h:278:43: error: excess elements in struct initializer constexpr __half2_raw zero = {0x0000, 0x0000}; ^~

rraminen commented 1 year ago

Use https://github.com/ROCmSoftwarePlatform/DeepSpeed/tree/transformer_inference branch which contains the changes mentioned in this PR

Closing this PR as we have the below two merged

  1. https://github.com/ROCmSoftwarePlatform/DeepSpeed/pull/59
  2. https://github.com/ROCmSoftwarePlatform/DeepSpeed/pull/60