Lightning-Universe / lightning-bolts

Toolbox of models, callbacks, and datasets for AI/ML researchers.
https://lightning-bolts.readthedocs.io
Apache License 2.0
1.68k stars 320 forks source link

tests: failing due to `argparse.ArgumentError: argument --fast_dev_run: expected one argument` #471

Closed akihironitta closed 3 years ago

akihironitta commented 3 years ago

🐛 Bug

The current tests on the master branch are failing due to:

argparse.ArgumentError: argument --fast_dev_run: expected one argument
Failing runs

(due to the above error on https://github.com/PyTorchLightning/pytorch-lightning-bolts/commit/a72766c461a498ab67b7c2efe7e0d49224290837 for example) - https://github.com/PyTorchLightning/pytorch-lightning-bolts/runs/1586140031 - https://github.com/PyTorchLightning/pytorch-lightning-bolts/runs/1586140045

Failing tests

``` tests/models/test_scripts.py::test_cli_run_basic_ae[--dataset cifar10 --data_dir /home/runner/work/pytorch-lightning-bolts/pytorch-lightning-bolts/datasets --max_epochs 1 --batch_size 2 --fast_dev_run] tests/models/test_scripts.py::test_cli_run_basic_vae[--dataset cifar10 --data_dir /home/runner/work/pytorch-lightning-bolts/pytorch-lightning-bolts/datasets --max_epochs 1 --batch_size 2 --fast_dev_run] tests/models/rl/test_scripts.py::test_cli_run_rl_dqn[--env PongNoFrameskip-v4 --max_steps 10 --fast_dev_run --warm_start_size 10 --n_steps 2 --batch_size 10] tests/models/rl/test_scripts.py::test_cli_run_rl_double_dqn[--env PongNoFrameskip-v4 --max_steps 10 --fast_dev_run --warm_start_size 10 --n_steps 2 --batch_size 10] tests/models/rl/test_scripts.py::test_cli_run_rl_dueling_dqn[--env PongNoFrameskip-v4 --max_steps 10 --fast_dev_run --warm_start_size 10 --n_steps 2 --batch_size 10] tests/models/rl/test_scripts.py::test_cli_run_rl_noisy_dqn[--env PongNoFrameskip-v4 --max_steps 10 --fast_dev_run --warm_start_size 10 --n_steps 2 --batch_size 10] tests/models/rl/test_scripts.py::test_cli_run_rl_per_dqn[--env PongNoFrameskip-v4 --max_steps 10 --fast_dev_run --warm_start_size 10 --n_steps 2 --batch_size 10] tests/models/rl/test_scripts.py::test_cli_run_rl_reinforce[--env CartPole-v0 --max_steps 10 --fast_dev_run --batch_size 10] tests/models/rl/test_scripts.py::test_cli_run_rl_vanilla_policy_gradient[--env CartPole-v0 --max_steps 10 --fast_dev_run --batch_size 10] tests/models/self_supervised/test_scripts.py::test_cli_run_self_supervised_amdim[--data_dir /home/runner/work/pytorch-lightning-bolts/pytorch-lightning-bolts/datasets --max_epochs 1 --max_steps 3 --fast_dev_run --batch_size 2] tests/models/self_supervised/test_scripts.py::test_cli_run_self_supervised_moco[--data_dir /home/runner/work/pytorch-lightning-bolts/pytorch-lightning-bolts/datasets --max_epochs 1 --max_steps 3 --fast_dev_run --batch_size 2] tests/models/self_supervised/test_scripts.py::test_cli_run_self_supervised_byol[--data_dir /home/runner/work/pytorch-lightning-bolts/pytorch-lightning-bolts/datasets --max_epochs 1 --max_steps 3 --fast_dev_run --batch_size 2 --online_ft] ```

To Reproduce

It looks tests on {macOS, Ubuntu} x {3.6} are failing... further investigating...

akihironitta commented 3 years ago

I was guessing that all tests with --fast_dev_run are failing, but that's not true as the following tests which have --fast_dev_run passed:

tests/models/self_supervised/test_scripts.py::test_cli_run_self_supervised_simclr[--data_dir /home/runner/work/pytorch-lightning-bolts/pytorch-lightning-bolts/datasets --gpus 0 --fp32 --max_epochs 1 --max_steps 3 --fast_dev_run --batch_size 2 --online_ft] PASSED [ 83%]
tests/models/self_supervised/test_scripts.py::test_cli_run_self_supervised_swav[--dataset cifar10 --data_dir /home/runner/work/pytorch-lightning-bolts/pytorch-lightning-bolts/datasets --max_epochs 1 --max_steps 3 --fast_dev_run --batch_size 2 --gpus 0 --arch resnet18 --hidden_mlp 512 --fp32 --sinkhorn_iterations 1 --nmb_prototypes 2] PASSED [ 83%]
Borda commented 3 years ago

I feel that the problem is that this fast dev is not the one taken from Trainer but coded/added extra...

akihironitta commented 3 years ago

You're right. The models passing the tests have their own --fast_dev_run. I'm not sure yet why the tests for the rest models fail only on Python 3.6 though.

Do you think it's reasonable to add --fast_dev_run to the other models for now? (although I think --fast_dev_run should be included not in models but in trainer...)

akihironitta commented 3 years ago

@Borda @ananyahjha93 I looked through the changelog (3.6->3.7) of argparse and the docs and tried to locate how this fast dev is not the one taken from Trainer but coded/added extra... as @Borda mentioned, but I couldn't find a way to solve this problem.

An easy workaround to fix this error would be to add --fast_dev_run args to faiilng models... https://github.com/PyTorchLightning/pytorch-lightning-bolts/issues/471#issuecomment-750794733

MaveriQ commented 3 years ago

I resolved this by passing --fast_dev_run 1 as command line argument