ai-forever / ru-gpts

Russian GPT3 models.
Apache License 2.0
2.08k stars 444 forks source link

deepspeed installation error #40

Closed DeniskinBeast closed 3 years ago

DeniskinBeast commented 3 years ago

Ошибка при установке deepspeed, не устанавливаются расширения для cpu_adam и sparse_attention. При установке версии 0.3.7 с заданными параметрами не выдает никаких ошибок, но расширения не устанавливаются.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

cpu_adam ............... [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] utils .................. [NO] ....... [OKAY]

DeepSpeed general environment info: torch install path ............... ['/home/nikita/.local/lib/python3.8/site-packages/torch'] torch version .................... 1.6.0+cu101 torch cuda version ............... 10.1 nvcc version ..................... 10.1 deepspeed install path ........... ['/home/nikita/.local/lib/python3.8/site-packages/deepspeed'] deepspeed info ................... 0.3.7, unknown, unknown deepspeed wheel compiled w. ...... torch 1.6, cuda 10.1

При попытке установки версии 0.3.11 или запуска установочного скрипта из репозитория deepspeed с теми же параметрами выдает ошибку, лог ошибки при запуске установочного скрипта:

No hostfile exists at /job/hostfile, installing locally Building deepspeed wheel DS_BUILD_OPS=0 Install Ops={'cpu_adam': 1, 'fused_adam': False, 'fused_lamb': False, 'sparse_attn': 1, 'transformer': False, 'stochastic_transformer': False, 'utils': False} version=0.3.11+29fa4b2, git_hash=29fa4b2, git_branch=master install_requires=['torch>=1.2', 'torchvision>=0.4.0', 'tqdm', 'tensorboardX==1.8', 'ninja', 'numpy', 'triton==0.2.3'] compatible_ops={'cpu_adam': True, 'fused_adam': True, 'fused_lamb': True, 'sparse_attn': True, 'transformer': True, 'stochastic_transformer': True, 'utils': True} ext_modules=[<setuptools.extension.Extension('deepspeed.ops.adam.cpu_adam_op') at 0x7fcb7be90820>, <setuptools.extension.Extension('deepspeed.ops.sparse_attention.sparse_attn_op') at 0x7fcb7be909d0>] running bdist_wheel running build running build_py copying deepspeed/git_version_info_installed.py -> build/lib.linux-x86_64-3.8/deepspeed running egg_info writing deepspeed.egg-info/PKG-INFO writing dependency_links to deepspeed.egg-info/dependency_links.txt writing requirements to deepspeed.egg-info/requires.txt writing top-level names to deepspeed.egg-info/top_level.txt reading manifest file 'deepspeed.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '.cc' under directory 'deepspeed' warning: no files found matching '.tr' under directory 'csrc' warning: no files found matching '*.cc' under directory 'csrc' writing manifest file 'deepspeed.egg-info/SOURCES.txt' running build_ext building 'deepspeed.ops.adam.cpu_adam_op' extension x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -Icsrc/includes -I/usr/local/cuda/include -I/home/nikita/.local/lib/python3.8/site-packages/torch/include -I/home/nikita/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/nikita/.local/lib/python3.8/site-packages/torch/include/TH -I/home/nikita/.local/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.8 -c csrc/adam/cpu_adam.cpp -o build/temp.linux-x86_64-3.8/csrc/adam/cpu_adam.o -O3 -std=c++14 -L/usr/local/cuda/lib64 -lcudart -lcublas -g -Wno-reorder -march=native -fopenmp -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0 x86_64-linux-gnu-gcc: error: : Нет такого файла или каталога error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 Error on line 155 Fail to install deepspeed

Это выдает cpufeature.print_features(), может быть пригодится:

=== CPU FEATURES === VendorId : AuthenticAMD num_virtual_cores : 16 num_physical_cores : 8 num_threads_per_core : 2 num_cpus : 0 cache_line_size : 64 cache_L1_size : 0 cache_L2_size : 0 cache_L3_size : 0 OS_x64 : True OS_AVX : True OS_AVX512 : False MMX : True x64 : True ABM : True RDRAND : True BMI1 : True BMI2 : True ADX : True PREFETCHWT1 : False MPX : False SSE : True SSE2 : True SSE3 : True SSSE3 : True SSE4.1 : True SSE4.2 : True SSE4.a : True AES : True SHA : True AVX : True XOP : False FMA3 : True FMA4 : False AVX2 : True AVX512f : False AVX512pf : False AVX512er : False AVX512cd : False AVX512vl : False AVX512bw : False AVX512dq : False AVX512ifma : False AVX512vbmi : False

which x86_64-linux-gnu-gcc /usr/bin/x86_64-linux-gnu-gcc

gcc --version gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

Заранее благодарю за помощь!

mgrankin commented 3 years ago

Этот баг связан с компиляцией для AMD процессоров https://github.com/microsoft/DeepSpeed/issues/788

Исправление в репозитории deepspeed уже есть, но в релизах его пока нет. Workaround - установить deepspeed из исходников.

mgrankin commented 3 years ago
git clone https://github.com/microsoft/DeepSpeed.git
cd DeepSpeed
git format-patch -1 1903a1380ef6d5a45f77c59002edf1b7120d0d05 --stdout > ~/file.patch
cat ~/file.patch
git checkout tags/v0.3.7
git am < ~/file.patch
rm ~/file.patch
DS_BUILD_CPU_ADAM=1 DS_BUILD_SPARSE_ATTN=1 pip install -v --disable-pip-version-check --no-cache-dir ./
ds_report
AshKaeN commented 3 years ago

Здравствуйте. У меня ошибка при выполнении команды !DS_BUILD_CPU_ADAM=1 DS_BUILD_SPARSE_ATTN=1 pip install deepspeed==0.3.7. Появляется сообщение: "ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output. /bin/bash: ds_report: command not found" в Colab. Метод выше не помог. Кто-нибудь сможет подсказать в чём дело?