ai-forever / ru-gpts

Russian GPT3 models.
Apache License 2.0
2.08k stars 444 forks source link

ruGPT3XL_generation example does not work #60

Closed qo4on closed 2 years ago

qo4on commented 3 years ago
!DS_BUILD_CPU_ADAM=1 DS_BUILD_SPARSE_ATTN=1 pip install deepspeed==0.3.7

Collecting deepspeed==0.3.7
  Downloading https://files.pythonhosted.org/packages/1f/f6/4de24b5790621e9eb787b7e4d90a57075ebbb85e81100a0dc8c50fdba8ba/deepspeed-0.3.7.tar.gz (258kB)
     |████████████████████████████████| 266kB 7.5MB/s 
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

I tried it in Colab. Any ideas how to fix?

Generate_text_with_RuGPTs_HF does not work also:

from transformers import GPT2LMHeadModel, GPT2Tokenizer

ImportError                               Traceback (most recent call last)
<ipython-input-5-4bb89d36a3dc> in <module>()
----> 1 from transformers import GPT2LMHeadModel, GPT2Tokenizer

2 frames
/usr/local/lib/python3.7/dist-packages/transformers/__init__.py in <module>()
    624 
    625     # Trainer
--> 626     from .trainer import Trainer
    627     from .trainer_pt_utils import torch_distributed_zero_first
    628 else:

/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in <module>()
     67     TrainerState,
     68 )
---> 69 from .trainer_pt_utils import (
     70     DistributedTensorGatherer,
     71     SequentialDistributedSampler,

/usr/local/lib/python3.7/dist-packages/transformers/trainer_pt_utils.py in <module>()
     38     SAVE_STATE_WARNING = ""
     39 else:
---> 40     from torch.optim.lr_scheduler import SAVE_STATE_WARNING
     41 
     42 logger = logging.get_logger(__name__)

ImportError: cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler' (/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py)
Ulitochka commented 3 years ago

Hi.

In colab we have default versions:

See this code:

!git clone https://github.com/microsoft/DeepSpeed.git
%cd DeepSpeed/
!DS_BUILD_CPU_ADAM=1 DS_BUILD_SPARSE_ATTN=1 pip install -v --disable-pip-version-check --no-cache-dir ./

        f"Installed CUDA version {sys_cuda_version} does not match the "
    Exception: Installed CUDA version 11.0 does not match the version torch was compiled with 10.1, unable to compile cuda/cpp extensions without a matching cuda version.
    DS_BUILD_OPS=0

We can install required package versions:

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [YES] ...... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [YES] ...... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.7/dist-packages/torch']
torch version .................... 1.7.0+cu110
torch cuda version ............... 11.0
nvcc version ..................... 11.0
deepspeed install path ........... ['/usr/local/lib/python3.7/dist-packages/deepspeed']
deepspeed info ................... 0.3.7, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.7, cuda 11.0
king-menin commented 3 years ago

The triton version was installed incorrectly. you need to remove deepspeed and newt and try this:

!rm -rf /tmp/DeepSpeed

!pip install triton==0.2.3

cd /tmp && git clone https://github.com/microsoft/DeepSpeed.git && cd DeepSpeed/ && git checkout ff58fa7e5a4f637a21d11daad0192683fe50ed15 && pip uninstall -y typing && pip install cpufeature && DS_BUILD_CPU_ADAM=1 DS_BUILD_SPARSE_ATTN=1 /tmp/DeepSpeed/install.sh -n && pip install typing

pip install transformers==3.5.1

Artyrm commented 3 years ago

@Ulitochka Но потом вот эта строка !DS_BUILD_CPU_ADAM=1 DS_BUILD_SPARSE_ATTN=1 pip install deepspeed==0.3.7 переустанавливает torch:

Found existing installation: torch 1.7.0+cu110
    Uninstalling torch-1.7.0+cu110:
      Successfully uninstalled torch-1.7.0+cu110
Successfully installed deepspeed-0.3.7 ninja-1.10.2 tensorboardX-1.8 torch-1.9.0

В итоге:

DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.7/dist-packages/torch']
torch version .................... 1.9.0+cu102
torch cuda version ............... 10.2
nvcc version ..................... 11.0
deepspeed install path ........... ['/usr/local/lib/python3.7/dist-packages/deepspeed']
deepspeed info ................... 0.3.7, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.7, cuda 11.0

И строка import deepspeed.ops.sparse_attention.sparse_attn_op не выполняется:

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-21-2d9098395ca5> in <module>()
      1 # And this cell should be run without errors
----> 2 import deepspeed.ops.sparse_attention.sparse_attn_op

ModuleNotFoundError: No module named 'deepspeed.ops.sparse_attention.sparse_attn_op'
Artyrm commented 3 years ago

In fact, no big deal with torch uninstallation. Can use --no-dependencies key for deepspeed install. And maybe some deps to install separately.

ITV1 commented 3 years ago

Regarding the "ImportError: cannot import name 'SAVE_STATE_WARNING'" : Maybe it will be useful for someone, as I understood, this error is related with a newer video-cards and can be fixed by editing the _trainer_ptutils.py file with:

sudo nano /usr/local/lib/python3.7/dist-packages/transformers/trainer_pt_utils.py

and changing the:

       SAVE_STATE_WARNING = ""
else:
       from torch.optim.lr_scheduler import SAVE_STATE_WARNING

logger = logging.get_logger(__name__)

to:

      SAVE_STATE_WARNING = ""
try:
      from torch.optim.lr_scheduler import SAVE_STATE_WARNING
else:
      SAVE_STATE_WARNING = ""       

logger = logging.get_logger(__name__)
king-menin commented 2 years ago

Add fixes for last updates on colab for rugpt3xl