facebookresearch / vissl

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
https://vissl.ai
MIT License
3.24k stars 330 forks source link

Possible Bug: `AttributeError: 'CheckNanModelOutputHook' object has no attribute '_checkpoint_model'` #490

Closed Pedrexus closed 2 years ago

Pedrexus commented 2 years ago

Instructions To Reproduce the 🐛 Bug:

I'm getting the error AttributeError: 'CheckNanModelOutputHook' object has no attribute '_checkpoint_model' from time to time, and by looking at the code, it seems this class does not have such method.

  1. what changes you made (git diff) or what code you wrote

    • none
  2. what you observed (including full logs):

    Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
    File "/opt/conda/lib/python3.8/site-packages/vissl/utils/distributed_launcher.py", line 192, in _distributed_worker
    run_engine(
    File "/opt/conda/lib/python3.8/site-packages/vissl/engines/engine_registry.py", line 86, in run_engine
    engine.run_engine(
    File "/opt/conda/lib/python3.8/site-packages/vissl/engines/train.py", line 39, in run_engine
    train_main(
    File "/opt/conda/lib/python3.8/site-packages/vissl/engines/train.py", line 130, in train_main
    trainer.train()
    File "/opt/conda/lib/python3.8/site-packages/vissl/trainer/trainer_main.py", line 212, in train
    raise e
    File "/opt/conda/lib/python3.8/site-packages/vissl/trainer/trainer_main.py", line 194, in train
    task = train_step_fn(task)
    File "/opt/conda/lib/python3.8/site-packages/vissl/trainer/train_steps/standard_train_step.py", line 155, in standard_train_step
    task.run_hooks(SSLClassyHookFunctions.on_forward.name)
    File "/opt/conda/lib/python3.8/site-packages/vissl/trainer/train_task.py", line 667, in run_hooks
    getattr(hook, hook_function_name, ClassyHook._noop)(self)
    File "/opt/conda/lib/python3.8/site-packages/vissl/hooks/state_update_hooks.py", line 153, in on_forward
    self._checkpoint_model(
    AttributeError: 'CheckNanModelOutputHook' object has no attribute '_checkpoint_model'

Expected behavior:

QuentinDuval commented 2 years ago

Hi @Pedrexus,

First of all, thank you for raising the bug !

I made fix in https://github.com/facebookresearch/vissl/commit/41c4682b0b3eb9458ceb3f3e142fe82fe21d409e that should solve the issue (apparently it is a bug following a refactoring).

Could you try and tell me if this fixes it for you too?

Thank you, Quentin

Pedrexus commented 2 years ago

Hello @QuentinDuval,

the bug seems to be fixed for me!

Thanks a lot!