cvlab-columbia / globetrotter

Code for the Globetrotter project
23 stars 6 forks source link

AssertionError: For the visual loss we need augmented images #5

Open manxiaoyu opened 3 months ago

manxiaoyu commented 3 months ago

Traceback (most recent call last): File "/media/visionx/monica/newproject/globetrotter/main.py", line 261, in main() File "/media/visionx/monica/newproject/globetrotter/main.py", line 134, in main args = get_args() File "/media/visionx/monica/newproject/globetrotter/main.py", line 106, in get_args assert not (args.lambda_visual_loss > 0 and not args.augment_image) or args.test, \ AssertionError: For the visual loss we need augmented images

what can I do to fix this bug?

surisdi commented 3 months ago

The parameter augment_image is set to False, you have to set it to True if you want to train the visual loss (this is, the lambda associated to the visual loss is >0)

manxiaoyu commented 3 months ago

The parameter augment_image is set to False, you have to set it to True if you want to train the visual loss (this is, the lambda associated to the visual loss is >0)

I have tried adding the default argument in main.py like this: parser.add_argument('--augment_image', action='store_true', default=True, help='Dataset returns two augmented images') However, I still encounter the AssertionError issue. For example, the traceback shows:

Traceback (most recent call last):
  File "/media/visionx/monica/newproject/globetrotter/main.py", line 261, in <module>
    main()
  File "/media/visionx/monica/newproject/globetrotter/main.py", line 134, in main
    args = get_args()
  File "/media/visionx/monica/newproject/globetrotter/main.py", line 112, in get_args
    assert args.name is not None and len(args.name) > 0
AssertionError

and

Traceback (most recent call last):
  File "/media/visionx/monica/newproject/globetrotter/main.py", line 261, in <module>
    main()
  File "/media/visionx/monica/newproject/globetrotter/main.py", line 134, in main
    args = get_args()
  File "/media/visionx/monica/newproject/globetrotter/main.py", line 124, in get_args
    assert args.lambda_orthogonality_loss == 0
AssertionError

Is this because of the inability to train on a single GPU, or is there a problem with my running method?

surisdi commented 3 months ago

Hi,

These are different error messages. But all of them are related to the fact that the parameters are not the expected ones. Specifically, for the first error you have to set a name for the experiment, and for the second error you have to set the parameter lambda_orthogonality_loss to 0 (it should only be >0 when we are trying to reproduce the Sigurdsson et al. baseline).

I recommend you start with some of the commands provided in the repository, for example this one here, as suggested in the README. You will notice that all the parameters are correctly set there, just make sure to change the paths to your data.

manxiaoyu commented 3 months ago

Hi,

These are different error messages. But all of them are related to the fact that the parameters are not the expected ones. Specifically, for the first error you have to set a name for the experiment, and for the second error you have to set the parameter lambda_orthogonality_loss to 0 (it should only be >0 when we are trying to reproduce the Sigurdsson et al. baseline).

I recommend you start with some of the commands provided in the repository, for example this one here, as suggested in the README. You will notice that all the parameters are correctly set there, just make sure to change the paths to your data.

sure,I tried this,but First,I have no idea what's the runs_dir ?And I don't know what it's used for or what content it stores. Sencod,When I set runs_dir to an empty folder, I got the following error:

main.py: error: unrecognized arguments: --local-rank=0 True--local_rank=-1--resume_name train_globetrotter ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 324313) of binary: /home/visionx/anaconda3/envs/globetrotter/bin/python Traceback (most recent call last): File "/home/visionx/anaconda3/envs/globetrotter/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/visionx/anaconda3/envs/globetrotter/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/visionx/anaconda3/envs/globetrotter/lib/python3.8/site-packages/torch/distributed/launch.py", line 196, in <module> main() File "/home/visionx/anaconda3/envs/globetrotter/lib/python3.8/site-packages/torch/distributed/launch.py", line 192, in main launch(args) File "/home/visionx/anaconda3/envs/globetrotter/lib/python3.8/site-packages/torch/distributed/launch.py", line 177, in launch run(args) File "/home/visionx/anaconda3/envs/globetrotter/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/home/visionx/anaconda3/envs/globetrotter/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/visionx/anaconda3/envs/globetrotter/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ ../../main.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-04-18_15:40:18 host : visionx rank : 0 (local_rank: 0) exitcode : 2 (pid: 324313) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================

surisdi commented 3 months ago

runs_dir is the directory where the output information about the runs (the execution) is stored. You can set it to any empty directory.

The error above has to do with you not passing the parameters correctly. Leave spaces between parameters.

manxiaoyu commented 3 months ago

runs_dir is the directory where the output information about the runs (the execution) is stored. You can set it to any empty directory.

The error above has to do with you not passing the parameters correctly. Leave spaces between parameters.

you're so kind!THANK YOU FOR YOUR HELP!During the talk with you,I find the source of the ERROR. NOW,I get the reason for the error:My torch version is not correct,because I have cuda==11.8,it's not easy to find an suitable combination between cuda and torch for me, luckily, it works!THESE MODULES ARE VITAL !

Python 3.8.19
tokenizers               0.8.1rc2
torch                    1.13.0
torchaudio               0.13.0
torchvision              0.14.0
tqdm                     4.66.2
transformers             3.3.0