Open mansooreh1 opened 4 months ago
Thank you for your attention!
This script assumes you have 8 GPUs available, which might not be the case in your environment (such as Google Colab). You can try adjusting the --nproc_per_node
argument to match the number of GPUs available to you. For example, setting --nproc_per_node=1
if you’re using a single GPU.
Please let me know if this helps!
Hello When I train HICO-DET on your code and run this line : sh configs/sov-stg-swin-l_scratch.sh, I get the following error:
python -m torch.distributed.launch --nproc_per_node=8 main.py --dataset_file hico --hoi_path data/hico_det --num_obj_classes 80 --num_verb_classes 117 --batch_size 2 --swin_pretrained params/swin_large_patch4_window12_384_22k.pth --output_dir logs/hico_sov-stg-swin-l_scratch_00001 --use_wandb --wandb_project hico --wandb_name sov-stg-swin-l_scratch_00001 --use_checkpoint -c slconfig/sov-stg-swin_l.py /usr/local/lib/python3.10/dist-packages/torch/distributed/launch.py:183: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use-env is set by default in torchrun. If your script expects
--local-rank
argument to be set, please change it to read fromos.environ['LOCAL_RANK']
instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructionswarnings.warn( [2024-05-19 06:43:14,672] torch.distributed.run: [WARNING] [2024-05-19 06:43:14,672] torch.distributed.run: [WARNING] [2024-05-19 06:43:14,672] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. [2024-05-19 06:43:14,672] torch.distributed.run: [WARNING] usage: SOV-STG script [-h] --config_file CONFIG_FILE [--teacher_config_file TEACHER_CONFIG_FILE] [--options OPTIONS [OPTIONS ...]] [--dataset_file {hico,vcoco}] [--hoi_path HOI_PATH] [--diff_path DIFF_PATH] [--num_verb_classes NUM_VERB_CLASSES] [--num_hoi_classes NUM_HOI_CLASSES] [--num_obj_classes NUM_OBJ_CLASSES] [--subject_category_id SUBJECT_CATEGORY_ID] [--output_dir OUTPUT_DIR] [--note NOTE] [--resume RESUME] [--pretrain_model_path PRETRAIN_MODEL_PATH] [--swin_pretrained SWIN_PRETRAINED] [--finetune_ignore FINETUNE_IGNORE [FINETUNE_IGNORE ...]] [--start_epoch N] [--eval] [--batch_size BATCH_SIZE] [--num_workers NUM_WORKERS] [--test] [--debug] [--find_unused_params] [--use_checkpoint] [--save_results] [--save_log] [--wo_log_unscaled] [--use_wandb] [--wandb_project WANDB_PROJECT] [--wandb_name WANDB_NAME] [--world_size WORLD_SIZE] [--dist_url DIST_URL] [--rank RANK] [--local_rank LOCAL_RANK] [--amp] SOV-STG script: error: unrecognized arguments: --local-rank=6 usage: SOV-STG script [-h] --config_file CONFIG_FILE [--teacher_config_file TEACHER_CONFIG_FILE] [--options OPTIONS [OPTIONS ...]] [--dataset_file {hico,vcoco}] [--hoi_path HOI_PATH] [--diff_path DIFF_PATH] [--num_verb_classes NUM_VERB_CLASSES] [--num_hoi_classes NUM_HOI_CLASSES] [--num_obj_classes NUM_OBJ_CLASSES] [--subject_category_id SUBJECT_CATEGORY_ID] [--output_dir OUTPUT_DIR] [--note NOTE] [--resume RESUME] [--pretrain_model_path PRETRAIN_MODEL_PATH] [--swin_pretrained SWIN_PRETRAINED] [--finetune_ignore FINETUNE_IGNORE [FINETUNE_IGNORE ...]] [--start_epoch N] [--eval] [--batch_size BATCH_SIZE] [--num_workers NUM_WORKERS] [--test] [--debug] [--find_unused_params] [--use_checkpoint] [--save_results] [--save_log] [--wo_log_unscaled] [--use_wandb] [--wandb_project WANDB_PROJECT] [--wandb_name WANDB_NAME] [--world_size WORLD_SIZE] [--dist_url DIST_URL] [--rank RANK] [--local_rank LOCAL_RANK] [--amp] SOV-STG script: error: unrecognized arguments: --local-rank=2 usage: SOV-STG script [-h] --config_file CONFIG_FILE [--teacher_config_file TEACHER_CONFIG_FILE] [--options OPTIONS [OPTIONS ...]] [--dataset_file {hico,vcoco}] [--hoi_path HOI_PATH] [--diff_path DIFF_PATH] [--num_verb_classes NUM_VERB_CLASSES] [--num_hoi_classes NUM_HOI_CLASSES] [--num_obj_classes NUM_OBJ_CLASSES] [--subject_category_id SUBJECT_CATEGORY_ID] [--output_dir OUTPUT_DIR] [--note NOTE] [--resume RESUME] [--pretrain_model_path PRETRAIN_MODEL_PATH] [--swin_pretrained SWIN_PRETRAINED] [--finetune_ignore FINETUNE_IGNORE [FINETUNE_IGNORE ...]] [--start_epoch N] [--eval] [--batch_size BATCH_SIZE] [--num_workers NUM_WORKERS] [--test] [--debug] [--find_unused_params] [--use_checkpoint] [--save_results] [--save_log] [--wo_log_unscaled] [--use_wandb] [--wandb_project WANDB_PROJECT] [--wandb_name WANDB_NAME] [--world_size WORLD_SIZE] [--dist_url DIST_URL] [--rank RANK] [--local_rank LOCAL_RANK] [--amp] SOV-STG script: error: unrecognized arguments: --local-rank=3 usage: SOV-STG script [-h] --config_file CONFIG_FILE [--teacher_config_file TEACHER_CONFIG_FILE] [--options OPTIONS [OPTIONS ...]] [--dataset_file {hico,vcoco}] [--hoi_path HOI_PATH] [--diff_path DIFF_PATH] [--num_verb_classes NUM_VERB_CLASSES] [--num_hoi_classes NUM_HOI_CLASSES] [--num_obj_classes NUM_OBJ_CLASSES] [--subject_category_id SUBJECT_CATEGORY_ID] [--output_dir OUTPUT_DIR] [--note NOTE] [--resume RESUME] [--pretrain_model_path PRETRAIN_MODEL_PATH] [--swin_pretrained SWIN_PRETRAINED] [--finetune_ignore FINETUNE_IGNORE [FINETUNE_IGNORE ...]] [--start_epoch N] [--eval] [--batch_size BATCH_SIZE] [--num_workers NUM_WORKERS] [--test] [--debug] [--find_unused_params] [--use_checkpoint] [--save_results] [--save_log] [--wo_log_unscaled] [--use_wandb] [--wandb_project WANDB_PROJECT] [--wandb_name WANDB_NAME] [--world_size WORLD_SIZE] [--dist_url DIST_URL] [--rank RANK] [--local_rank LOCAL_RANK] [--amp] usage: SOV-STG script [-h] --config_file CONFIG_FILE [--teacher_config_file TEACHER_CONFIG_FILE] [--options OPTIONS [OPTIONS ...]] [--dataset_file {hico,vcoco}] [--hoi_path HOI_PATH] [--diff_path DIFF_PATH] [--num_verb_classes NUM_VERB_CLASSES] [--num_hoi_classes NUM_HOI_CLASSES] [--num_obj_classes NUM_OBJ_CLASSES] [--subject_category_id SUBJECT_CATEGORY_ID] [--output_dir OUTPUT_DIR] [--note NOTE] [--resume RESUME] [--pretrain_model_path PRETRAIN_MODEL_PATH] [--swin_pretrained SWIN_PRETRAINED] [--finetune_ignore FINETUNE_IGNORE [FINETUNE_IGNORE ...]] [--start_epoch N] [--eval] [--batch_size BATCH_SIZE] [--num_workers NUM_WORKERS] [--test] [--debug] [--find_unused_params] [--use_checkpoint] [--save_results] [--save_log] [--wo_log_unscaled] [--use_wandb] [--wandb_project WANDB_PROJECT] [--wandb_name WANDB_NAME] [--world_size WORLD_SIZE] [--dist_url DIST_URL] [--rank RANK] [--local_rank LOCAL_RANK] [--amp] SOV-STG script: error: unrecognized arguments: --local-rank=4 SOV-STG script: error: unrecognized arguments: --local-rank=1 usage: SOV-STG script [-h] --config_file CONFIG_FILE [--teacher_config_file TEACHER_CONFIG_FILE] [--options OPTIONS [OPTIONS ...]] [--dataset_file {hico,vcoco}] [--hoi_path HOI_PATH] [--diff_path DIFF_PATH] [--num_verb_classes NUM_VERB_CLASSES] [--num_hoi_classes NUM_HOI_CLASSES] [--num_obj_classes NUM_OBJ_CLASSES] [--subject_category_id SUBJECT_CATEGORY_ID] [--output_dir OUTPUT_DIR] [--note NOTE] [--resume RESUME] [--pretrain_model_path PRETRAIN_MODEL_PATH] [--swin_pretrained SWIN_PRETRAINED] [--finetune_ignore FINETUNE_IGNORE [FINETUNE_IGNORE ...]] [--start_epoch N] [--eval] [--batch_size BATCH_SIZE] [--num_workers NUM_WORKERS] [--test] [--debug] [--find_unused_params] [--use_checkpoint] [--save_results] [--save_log] [--wo_log_unscaled] [--use_wandb] [--wandb_project WANDB_PROJECT] [--wandb_name WANDB_NAME] [--world_size WORLD_SIZE] [--dist_url DIST_URL] [--rank RANK] [--local_rank LOCAL_RANK] [--amp] SOV-STG script: error: unrecognized arguments: --local-rank=7 usage: SOV-STG script [-h] --config_file CONFIG_FILE [--teacher_config_file TEACHER_CONFIG_FILE] [--options OPTIONS [OPTIONS ...]] [--dataset_file {hico,vcoco}] [--hoi_path HOI_PATH] [--diff_path DIFF_PATH] [--num_verb_classes NUM_VERB_CLASSES] [--num_hoi_classes NUM_HOI_CLASSES] [--num_obj_classes NUM_OBJ_CLASSES] [--subject_category_id SUBJECT_CATEGORY_ID] [--output_dir OUTPUT_DIR] [--note NOTE] [--resume RESUME] [--pretrain_model_path PRETRAIN_MODEL_PATH] [--swin_pretrained SWIN_PRETRAINED] [--finetune_ignore FINETUNE_IGNORE [FINETUNE_IGNORE ...]] [--start_epoch N] [--eval] [--batch_size BATCH_SIZE] [--num_workers NUM_WORKERS] [--test] [--debug] [--find_unused_params] [--use_checkpoint] [--save_results] [--save_log] [--wo_log_unscaled] [--use_wandb] [--wandb_project WANDB_PROJECT] [--wandb_name WANDB_NAME] [--world_size WORLD_SIZE] [--dist_url DIST_URL] [--rank RANK] [--local_rank LOCAL_RANK] [--amp] SOV-STG script: error: unrecognized arguments: --local-rank=5 usage: SOV-STG script [-h] --config_file CONFIG_FILE [--teacher_config_file TEACHER_CONFIG_FILE] [--options OPTIONS [OPTIONS ...]] [--dataset_file {hico,vcoco}] [--hoi_path HOI_PATH] [--diff_path DIFF_PATH] [--num_verb_classes NUM_VERB_CLASSES] [--num_hoi_classes NUM_HOI_CLASSES] [--num_obj_classes NUM_OBJ_CLASSES] [--subject_category_id SUBJECT_CATEGORY_ID] [--output_dir OUTPUT_DIR] [--note NOTE] [--resume RESUME] [--pretrain_model_path PRETRAIN_MODEL_PATH] [--swin_pretrained SWIN_PRETRAINED] [--finetune_ignore FINETUNE_IGNORE [FINETUNE_IGNORE ...]] [--start_epoch N] [--eval] [--batch_size BATCH_SIZE] [--num_workers NUM_WORKERS] [--test] [--debug] [--find_unused_params] [--use_checkpoint] [--save_results] [--save_log] [--wo_log_unscaled] [--use_wandb] [--wandb_project WANDB_PROJECT] [--wandb_name WANDB_NAME] [--world_size WORLD_SIZE] [--dist_url DIST_URL] [--rank RANK] [--local_rank LOCAL_RANK] [--amp] SOV-STG script: error: unrecognized arguments: --local-rank=0 [2024-05-19 06:43:49,790] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 23989 closing signal SIGTERM [2024-05-19 06:43:49,790] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 23990 closing signal SIGTERM [2024-05-19 06:43:49,790] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 23991 closing signal SIGTERM [2024-05-19 06:43:49,790] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 23992 closing signal SIGTERM [2024-05-19 06:43:49,790] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 23993 closing signal SIGTERM [2024-05-19 06:43:49,791] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 23994 closing signal SIGTERM [2024-05-19 06:43:49,791] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 23996 closing signal SIGTERM [2024-05-19 06:43:50,056] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 2) local_rank: 6 (pid: 23995) of binary: /usr/bin/python3 Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launch.py", line 198, in
main()
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launch.py", line 194, in main
launch(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launch.py", line 179, in launch
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
main.py FAILED
Failures: