OpenGVLab / LAMM

[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
https://openlamm.github.io/
286 stars 15 forks source link

AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' #47

Closed Xiaolong-RRL closed 9 months ago

Xiaolong-RRL commented 9 months ago

Dear author:

Thanks for your interesting work.

When I run 3D Models Training with 'sh scripts/train_lamm3d.sh' after Installation, the following error happened:

Using /data/x/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /data/x/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
1.11.1.git.kitware.jobserver-1
Loading extension module cpu_adam...
Traceback (most recent call last):
  File "/data/x/code/lamm/src/train.py", line 255, in <module>
    main(**cfg)
  File "/data/x/code/lamm/src/train.py", line 226, in main
    agent = load_model(args)
  File "/data/x/code/lamm/src/model/__init__.py", line 9, in load_model
    agent = globals()[agent_name](model, args)
  File "/data/x/code/lamm/src/model/agent.py", line 22, in __init__
    self.ds_engine, self.optimizer, _, _ = deepspeed.initialize(
  File "/data/x/miniconda3/envs/lamm/lib/python3.10/site-packages/deepspeed/__init__.py", line 165, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/data/x/miniconda3/envs/lamm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 309, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/data/x/miniconda3/envs/lamm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1174, in _configure_optimizer
    basic_optimizer = self._configure_basic_optimizer(model_parameters)
  File "/data/x/miniconda3/envs/lamm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1230, in _configure_basic_optimizer
    optimizer = DeepSpeedCPUAdam(model_parameters,
  File "/data/x/miniconda3/envs/lamm/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in __init__
    self.ds_opt_adam = CPUAdamBuilder().load()
  File "/data/x/miniconda3/envs/lamm/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 454, in load
    return self.jit_load(verbose)
  File "/data/x/miniconda3/envs/lamm/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 497, in jit_load
    op_module = load(name=self.name,
  File "/data/x/miniconda3/envs/lamm/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/data/x/miniconda3/envs/lamm/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/data/x/miniconda3/envs/lamm/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1176, in create_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: /data/x/.cache/torch_extensions/py310_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7ff049ef4ee0>
Traceback (most recent call last):
  File "/data/x/miniconda3/envs/lamm/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

It semms like the CPU_Adam did not compile successfully, and the following is my conda env:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
absl-py                   2.0.0                    pypi_0    pypi
accelerate                0.23.0                   pypi_0    pypi
asttokens                 2.4.0                    pypi_0    pypi
av                        10.0.0                   pypi_0    pypi
backcall                  0.2.0                    pypi_0    pypi
bigmodelvis               0.0.1                    pypi_0    pypi
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2023.08.22           h06a4308_0  
cachetools                5.3.1                    pypi_0    pypi
certifi                   2023.7.22                pypi_0    pypi
charset-normalizer        3.3.0                    pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
cmake                     3.27.7                   pypi_0    pypi
cython                    3.0.4                    pypi_0    pypi
data                      0.4                      pypi_0    pypi
decorator                 5.1.1                    pypi_0    pypi
decord                    0.6.0                    pypi_0    pypi
deepspeed                 0.9.3                    pypi_0    pypi
einops                    0.7.0                    pypi_0    pypi
exceptiongroup            1.1.3                    pypi_0    pypi
executing                 2.0.0                    pypi_0    pypi
filelock                  3.12.4                   pypi_0    pypi
fsspec                    2023.9.2                 pypi_0    pypi
ftfy                      6.1.1                    pypi_0    pypi
funcsigs                  1.0.2                    pypi_0    pypi
fvcore                    0.1.5.post20221221          pypi_0    pypi
google-auth               2.23.3                   pypi_0    pypi
google-auth-oauthlib      1.1.0                    pypi_0    pypi
grpcio                    1.59.0                   pypi_0    pypi
hjson                     3.1.0                    pypi_0    pypi
huggingface-hub           0.17.3                   pypi_0    pypi
idna                      3.4                      pypi_0    pypi
iopath                    0.1.10                   pypi_0    pypi
ipdb                      0.13.13                  pypi_0    pypi
ipython                   8.16.1                   pypi_0    pypi
jedi                      0.19.1                   pypi_0    pypi
joblib                    1.3.2                    pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1  
libffi                    3.4.4                h6a678d5_0  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libstdcxx-ng              11.2.0               h1234567_1  
libuuid                   1.41.5               h5eee18b_0  
markdown                  3.5                      pypi_0    pypi
markdown-it-py            3.0.0                    pypi_0    pypi
markupsafe                2.1.3                    pypi_0    pypi
matplotlib-inline         0.1.6                    pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0  
networkx                  3.2                      pypi_0    pypi
ninja                     1.11.1                   pypi_0    pypi
nltk                      3.8.1                    pypi_0    pypi
numpy                     1.26.1                   pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
openssl                   3.0.11               h7f8727e_2  
packaging                 23.2                     pypi_0    pypi
parameterized             0.9.0                    pypi_0    pypi
parso                     0.8.3                    pypi_0    pypi
peft                      0.3.0                    pypi_0    pypi
pexpect                   4.8.0                    pypi_0    pypi
pickleshare               0.7.5                    pypi_0    pypi
pillow                    9.5.0                    pypi_0    pypi
pip                       23.3            py310h06a4308_0  
plumbum                   1.8.2                    pypi_0    pypi
plyfile                   1.0.1                    pypi_0    pypi
pointnet2                 0.0.0                    pypi_0    pypi
portalocker               2.8.2                    pypi_0    pypi
prompt-toolkit            3.0.39                   pypi_0    pypi
protobuf                  4.23.4                   pypi_0    pypi
psutil                    5.9.6                    pypi_0    pypi
ptyprocess                0.7.0                    pypi_0    pypi
pure-eval                 0.2.2                    pypi_0    pypi
py-cpuinfo                9.0.0                    pypi_0    pypi
pyasn1                    0.5.0                    pypi_0    pypi
pyasn1-modules            0.3.0                    pypi_0    pypi
pydantic                  1.10.13                  pypi_0    pypi
pygments                  2.16.1                   pypi_0    pypi
python                    3.10.13              h955ad1f_0  
pytorchvideo              0.1.5                    pypi_0    pypi
pyyaml                    6.0                      pypi_0    pypi
pyzmq                     25.1.1                   pypi_0    pypi
readline                  8.2                  h5eee18b_0  
regex                     2022.10.31               pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
rich                      13.6.0                   pypi_0    pypi
rpyc                      5.3.1                    pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
safetensors               0.4.0                    pypi_0    pypi
sentencepiece             0.1.99                   pypi_0    pypi
setuptools                65.5.1                   pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0  
stack-data                0.6.3                    pypi_0    pypi
tabulate                  0.9.0                    pypi_0    pypi
tensorboard               2.15.0                   pypi_0    pypi
tensorboard-data-server   0.7.1                    pypi_0    pypi
termcolor                 2.3.0                    pypi_0    pypi
timm                      0.6.7                    pypi_0    pypi
tk                        8.6.12               h1ccaba5_0  
tokenizers                0.14.1                   pypi_0    pypi
tomli                     2.0.1                    pypi_0    pypi
torch                     1.13.1+cu117             pypi_0    pypi
torchaudio                0.13.1+cu117             pypi_0    pypi
torchvision               0.14.1+cu117             pypi_0    pypi
tqdm                      4.66.1                   pypi_0    pypi
traitlets                 5.11.2                   pypi_0    pypi
transformers              4.34.1                   pypi_0    pypi
trimesh                   4.0.0                    pypi_0    pypi
triton                    2.0.0.dev20221202          pypi_0    pypi
typing-extensions         4.8.0                    pypi_0    pypi
tzdata                    2023c                h04d1e81_0  
urllib3                   2.0.7                    pypi_0    pypi
uvloop                    0.18.0                   pypi_0    pypi
wcwidth                   0.2.8                    pypi_0    pypi
werkzeug                  3.0.0                    pypi_0    pypi
wheel                     0.41.2          py310h06a4308_0  
xz                        5.4.2                h5eee18b_0  
yacs                      0.1.8                    pypi_0    pypi
zlib                      1.2.13               h5eee18b_0 

I wander wether you have encountered similar problems and how to solve them?

Best! Xiaolong

Xiaolong-RRL commented 9 months ago

The error solved according to the issue. (By the way, the error occured in 4090)

bdytx5 commented 2 weeks ago

if u need a quick fix, disable optimizer CPU offload

isjakewong commented 1 week ago

if u need a quick fix, disable optimizer CPU offload

How can we disable it?

bdytx5 commented 1 week ago

it should just be left out in the config json i think, or switch to stage 2 possibly. I forget which one

bdytx5 commented 1 week ago

eg cpu offloading for optimizer is not specified