ROCm / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
5 stars 3 forks source link

[BUG] DeepSpeed errors when running BLOOM #67

Open jataylo opened 1 year ago

jataylo commented 1 year ago

Describe the bug I am facing issues getting the BLOOM model to run with DeepSpeed using TOT upstream pytorch.

The first slough of errors observed are resolved with @rraminen's workaround in the transformer_inference branch.

This occurs both in 5.4.2 and 5.5.

Log snippet: deepspeed_error.txt

/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/ATen.h:4:2: error: #error C++17 or later compatible compiler is required to use ATen.
    4 | #error C++17 or later compatible compiler is required to use ATen.
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/core/ivalue_inl.h: In lambda function:
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/include/ATen/core/ivalue_inl.h:1061:30: error: ‘is_convertible_v’ is not a member of ‘std’; did you mean ‘is_convertible’?
1061 |         if constexpr (::std::is_convertible_v<typename c10::invoke_result_t<T &&, Future&>, IValueWithStorages>) {

To Reproduce Docker image: rocm/pytorch-private:BLOOM_DeepSpeed_tranformer_inference_enabled_tot_issue

Steps to reproduce the behavior:

  1. Build upstream PyTorch and the transformer_inference ROCm DeepSpeed branch
  2. git clone https://github.com/huggingface/transformers-bloom-inference
  3. deepspeed --num_gpus 1 transformers-bloom-inference/bloom-inference-scripts/bloom-ds-inference.py --name bigscience/bloom-560m

ds_report output DeepSpeed general environment info: torch install path ............... ['/opt/conda/lib/python3.8/site-packages/torch'] torch version .................... 2.1.0a0+gitfde024b deepspeed install path ........... ['/opt/conda/lib/python3.8/site-packages/deepspeed'] deepspeed info ................... 0.9.3+44c0bbfe, 44c0bbfe, transformer_inference torch cuda version ............... None torch hip version ................ 5.5.30201-c1741e9b nvcc version ..................... None deepspeed wheel compiled w. ...... torch 2.0, hip 5.5

jataylo commented 1 year ago

cc: @jithunnair-amd @dllehr-amd