Aradhye2002 / EcoDepth

[CVPR'2024] Official implementation of the paper "ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation"
https://ecodepth-iitd.github.io/
MIT License
167 stars 17 forks source link

ImportError preventing training #8

Closed DaDudeIan closed 7 months ago

DaDudeIan commented 7 months ago

During the execution of the train_nyu.sh script, an ImportError is encountered when trying to import VectorQuantizer2 from taming.modules.vqvae.quantize. This error prevents the training process from starting as the model initialization fails.

(ecodepth) cv19f24@node13:~/EcoDepth/depth$ bash train_nyu.sh
master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  warnings.warn(
| distributed init (rank 0): env://, gpu 0

 Max_depth = 10.0 meters for nyudepthv2!

<wandb logs>

model will be saved after every 200 steps
val will be done after every 200 steps
This experiment name is :  04151402_nyu_BS-16_lr-one_cycle_training_nyu
log_dir in main log_dir/04151402_nyu_BS-16_lr-one_cycle_training_nyu
Traceback (most recent call last):
  File "/home/cv19f24/EcoDepth/depth/train.py", line 540, in <module>
    main()
  File "/home/cv19f24/EcoDepth/depth/train.py", line 462, in main
    model = EcoDepth(args=args)
            ^^^^^^^^^^^^^^^^^^^
  File "/home/cv19f24/EcoDepth/depth/models/model.py", line 166, in __init__
    self.encoder = EcoDepthEncoder(out_dim=channels_in, dataset='nyu', args = args)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cv19f24/EcoDepth/depth/models/model.py", line 56, in __init__
    sd_model = instantiate_from_config(self.config.model)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cv19f24/EcoDepth/depth/ldm/util.py", line 85, in instantiate_from_config
    return get_obj_from_str(config["target"])(**config.get("params", dict()))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cv19f24/EcoDepth/depth/ldm/util.py", line 93, in get_obj_from_str
    return getattr(importlib.import_module(module, package=None), cls)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/cv19f24/EcoDepth/depth/ldm/models/diffusion/ddpm.py", line 25, in <module>
    from ldm.models.autoencoder import VQModelInterface, IdentityFirstStage, AutoencoderKL
  File "/home/cv19f24/EcoDepth/depth/ldm/models/autoencoder.py", line 6, in <module>
    from taming.modules.vqvae.quantize import VectorQuantizer2 as VectorQuantizer
ImportError: cannot import name 'VectorQuantizer2' from 'taming.modules.vqvae.quantize' (/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/taming/modules/vqvae/quantize.py)
<wandb logs>
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2380373) of binary: /home/cv19f24/.conda-2024.02/envs/ecodepth/bin/python
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/run.py", line 798, in <module>
    main()
  File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-04-15_14:02:25
  host      : node13
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 2380373)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

I have followed installation down to the letter, so I don't know what to do. Thank you

frankkim1108 commented 7 months ago

Hi, @DaDudeIan I had the same error this solve the problem.

pip install taming-transformers-rom1504

DaDudeIan commented 7 months ago

Thank you, @frankkim1108!

But I found another fix: https://github.com/CompVis/stable-diffusion/issues/72

Doing this fixed it for me 😊