PinataFarms / DAD-3DHeads

Official repo for DAD-3DHeads: A Large-scale Dense, Accurate and Diverse Dataset for 3D Head Alignment from a Single Image (CVPR 2022).
Other
444 stars 42 forks source link

Request for Assistance with Error in Executing train.py - ValueError: Unexpected keyword arguments: `compute_on_step` #46

Closed khamneeda closed 2 months ago

khamneeda commented 2 months ago

Dear @NeelayS @KupynOrest @burnmyletters @t-martyniuk,

Hello all. Thank you for your wonderful work! I'm here to kindly ask you for assistance in training this model.

I made conda virtual environment and just strictly followed your instruction, except installing setuptools==59.5.0. But while training the model, I got an error as follows:

Error executing job with overrides: []
Traceback (most recent call last):
  File "train.py", line 44, in run_experiment
    train(config)
  File "train.py", line 22, in train
    dad3d_net = FlameLightningModel(model=model, config=config, train=train_dataset, val=val_dataset)
  File "/home/kim/myface/dad/model/model_training/train/flame_lightning_model.py", line 72, in __init__
    self.iou_metric = SoftIoUMetric(compute_on_step=True)
  File "/home/kim/myface/dad/model/model_training/metrics/iou.py", line 44, in __init__
    super().__init__(
  File "/home/kim/anaconda3/envs/dadhead/lib/python3.8/site-packages/torchmetrics/metric.py", line 146, in __init__
    raise ValueError(f"Unexpected keyword arguments: {', '.join(kwargs_)}")
ValueError: Unexpected keyword arguments: `compute_on_step`

This is just same with an other guy's issue: https://github.com/PinataFarms/DAD-3DHeads/issues/39

What I did so far is put the dataset in the DAD-3DHeads/dataset/ directory, and installed the requirements.txt. I did not changed any setting in the .yaml files.

model_training/config/train.yaml

# @package _global_

hydra:
  run:
    dir: ./experiments/train/${now:%Y-%m-%d-%H-%M-%S}

defaults:
  - backend: 1gpu
  - dataset: dad_3d_heads
  - constants: flame_constants
  - model: resnet_regression
  - loss: train_loss
  - optimizer: adam
  - scheduler: plateau_min
  - train_stage: flame_landmarks
  - property_overrides: flame_landmarks.academic
  - utility_overrides: local

I installed setuptools==59.5.0 to fix the following error while running train.py before getting above error. This is the error I fixed.

Traceback (most recent call last):
  File "train.py", line 8, in <module>
    from model_training.train.trainer import DAD3DTrainer
  File "/home/kim/myface/dad/DAD-3DHeads/model_training/train/trainer.py", line 5, in <module>
    from pytorch_lightning import Trainer
  File "/home/kim/anaconda3/envs/dadhead2/lib/python3.8/site-packages/pytorch_lightning/__init__.py", line 30, in <module>
    from pytorch_lightning.callbacks import Callback  # noqa: E402
  File "/home/kim/anaconda3/envs/dadhead2/lib/python3.8/site-packages/pytorch_lightning/callbacks/__init__.py", line 26, in <module>
    from pytorch_lightning.callbacks.pruning import ModelPruning
  File "/home/kim/anaconda3/envs/dadhead2/lib/python3.8/site-packages/pytorch_lightning/callbacks/pruning.py", line 31, in <module>
    from pytorch_lightning.core.lightning import LightningModule
  File "/home/kim/anaconda3/envs/dadhead2/lib/python3.8/site-packages/pytorch_lightning/core/__init__.py", line 16, in <module>
    from pytorch_lightning.core.lightning import LightningModule
  File "/home/kim/anaconda3/envs/dadhead2/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 40, in <module>
    from pytorch_lightning.loggers import LightningLoggerBase
  File "/home/kim/anaconda3/envs/dadhead2/lib/python3.8/site-packages/pytorch_lightning/loggers/__init__.py", line 18, in <module>
    from pytorch_lightning.loggers.tensorboard import TensorBoardLogger
  File "/home/kim/anaconda3/envs/dadhead2/lib/python3.8/site-packages/pytorch_lightning/loggers/tensorboard.py", line 26, in <module>
    from torch.utils.tensorboard import SummaryWriter
  File "/home/kim/anaconda3/envs/dadhead2/lib/python3.8/site-packages/torch/utils/tensorboard/__init__.py", line 4, in <module>
    LooseVersion = distutils.version.LooseVersion
AttributeError: module 'distutils' has no attribute 'version'

My python version is 3.8.19 And these are the packages on my machine

absl-py==2.1.0
aiohttp==3.9.5
aiosignal==1.3.1
albumentations==1.0.0
antlr4-python3-runtime==4.8
astunparse==1.6.3
async-timeout==4.0.3
attrs==23.1.0
blinker==1.8.2
cachetools==5.3.3
certifi==2024.7.4
charset-normalizer==3.2.0
chumpy==0.70
click==8.1.7
coloredlogs==15.0.1
docker==6.1.3
fff==0.0.1
fire==0.6.0
Flask==3.0.3
flatbuffers==24.3.25
frozenlist==1.4.1
fsspec==2024.6.1
furl==2.1.3
gast==0.4.0
gcovr==6.0
gitdb==4.0.10
GitPython==3.1.37
giturlparse.py==0.0.5
google-auth==2.29.0
google-auth-oauthlib==1.0.0
google-pasta==0.2.0
grpcio==1.64.0
gunicorn==22.0.0
h5py==3.11.0
humanfriendly==10.0
hydra-core==1.1.0
idna==3.7
imageio==2.34.2
importlib_metadata==7.1.0
importlib_resources==6.4.0
isodate==0.6.1
itsdangerous==2.2.0
Jinja2==3.1.2
joblib==1.4.2
keras==2.13.1
lazy_loader==0.4
libclang==18.1.1
lightning-utilities==0.11.3.post0
lxml==4.9.3
Markdown==3.6
MarkupSafe==2.1.3
mtcnn==0.1.1
multidict==6.0.5
networkx==3.1
numpy==1.22.0
oauthlib==3.2.2
omegaconf==2.1.2
opencv-python==4.10.0.84
opencv-python-headless==4.10.0.84
opt-einsum==3.3.0
orderedmultidict==1.0.1
packaging==23.1
pillow==10.4.0
platformdirs==3.10.0
protobuf==4.25.3
pyasn1==0.6.0
pyasn1_modules==0.4.0
pyDeprecate==0.3.2
Pygments==2.16.1
pytorch-lightning==1.6.0
pytorch-ranger==0.1.1
pytorch-toolbelt==0.5.0
pytorchcv==0.0.65
pytz==2024.1
PyWavelets==1.4.1
PyYAML==6.0.1
requests==2.32.3
requests-file==1.5.1
requests-oauthlib==2.0.0
requests-toolbelt==1.0.0
rsa==4.9
scikit-image==0.21.0
scikit-learn==1.3.2
scipy==1.10.1
six==1.16.0
smmap==5.0.1
smplx==0.1.26
tensorboard==2.13.0
tensorboard-data-server==0.7.2
tensorflow==2.13.1
tensorflow-estimator==2.13.0
tensorflow-io-gcs-filesystem==0.34.0
termcolor==2.4.0
threadpoolctl==3.5.0
tifffile==2023.7.10
timm==0.4.5
torch==1.9.0
torch-optimizer==0.1.0
torchgeometry==0.1.2
torchmetrics==1.2.1
torchvision==0.10.0
tqdm==4.66.4
typing_extensions==4.5.0
urllib3==2.0.5
websocket-client==1.6.3
Werkzeug==3.0.3
wget==3.2
wrapt==1.16.0
yarl==1.9.4
zeep==4.2.1
zipp==3.19.0

Should I had to do something before training the model?

Thank you in advance!

khamneeda commented 2 months ago

Fix the problem :) I resolved this by downgrading torchmetrics==0.11.4. If you have same prob with me, check this: https://github.com/NVIDIA/NeMo/issues/6984