Open quantumiracle opened 2 months ago
Just to clarify, may I know what you are trying to achieve here? Is this for distributing process across multiple gpus? or is it for multi-process multi-gpu? The current code structure does not work even without vbench.
Hi @NattapolChan,
My evaluation process stops at "start evaluation" when running with multi GPUs:
vbench evaluate --ngpus=8 --dimension 'motion_smoothness' --videos_path data/demofusion_controlnet.mp4 --mode=custom_input WARNING:main:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
2024-09-28 12:04:28,518 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 1 2024-09-28 12:04:28,520 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 5 2024-09-28 12:04:28,531 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 2 2024-09-28 12:04:28,616 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 3 2024-09-28 12:04:29,361 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 7 2024-09-28 12:04:29,387 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 4 2024-09-28 12:04:29,410 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 6 2024-09-28 12:04:29,412 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 0 2024-09-28 12:04:29,412 - torch.distributed.distributed_c10d - INFO - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes. 2024-09-28 12:04:29,413 - torch.distributed.distributed_c10d - INFO - Rank 7: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes. 2024-09-28 12:04:29,413 - torch.distributed.distributed_c10d - INFO - Rank 5: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes. 2024-09-28 12:04:29,414 - torch.distributed.distributed_c10d - INFO - Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes. 2024-09-28 12:04:29,414 - torch.distributed.distributed_c10d - INFO - Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes. 2024-09-28 12:04:29,418 - torch.distributed.distributed_c10d - INFO - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes. 2024-09-28 12:04:29,418 - torch.distributed.distributed_c10d - INFO - Rank 4: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes. 2024-09-28 12:04:29,421 - torch.distributed.distributed_c10d - INFO - Rank 6: Completed store-based barrier for key:store_based_barrier_key:1 with 8 nodes. args: Namespace(output_path='./evaluation_results/', full_json_dir='/data/hli358/envs/vbench/lib/python3.10/site-packages/vbench/cli/../VBench_full_info.json', videos_path='data/demofusion_controlnet.mp4', dimension=['motion_smoothness'], load_ckpt_from_local=None, read_frame=None, mode='custom_input', prompt='None', prompt_file=None, category=None, imaging_quality_preprocessing_mode='longer') start evaluation
Can you try running these:
vbench evaluate --ngpus=1 --dimension 'motion_smoothness' --videos_path data/demofusion_controlnet.mp4 --mode=custom_input
vbench evaluate --ngpus=8 --dimension 'temporal_flickering' --videos_path data/demofusion_controlnet.mp4 --mode=custom_input
For the ngpus flag, it is to distribute the video across multiple gpu, so if there is only one video, only 1 gpu will be used for inference. The rest will not be allocated any video.
Can you try running these:
vbench evaluate --ngpus=1 --dimension 'motion_smoothness' --videos_path data/demofusion_controlnet.mp4 --mode=custom_input
vbench evaluate --ngpus=8 --dimension 'temporal_flickering' --videos_path data/demofusion_controlnet.mp4 --mode=custom_input
For the ngpus flag, it is to distribute the video across multiple gpu, so if there is only one video, only 1 gpu will be used for inference. The rest will not be allocated any video.
I create a 'duck' folder contains 5 videos,
This command will stuck at start evaluation: WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
2024-09-30 14:21:29,921 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 1 2024-09-30 14:21:29,932 - torch.distributed.distributed_c10d - INFO - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2024-09-30 14:21:29,932 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 0 2024-09-30 14:21:29,932 - torch.distributed.distributed_c10d - INFO - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. args: Namespace(output_path='./evaluation_results/', full_json_dir='/data/hli358/han/VBench/vbench/VBench_full_info.json', videos_path='duck/', dimension=['subject_consistency', 'background_consistency', 'motion_smoothness', 'dynamic_degree', 'aesthetic_quality', 'imaging_quality'], load_ckpt_from_local=None, read_frame=None, mode='custom_input', prompt='None', prompt_file=None, category=None, imaging_quality_preprocessing_mode='longer') start evaluation
I have tried quite a few configurations, but cannot make it to get stuck at start evaluation. The command I used is the same and the test folder also contains 5 videos. Here is the log:
2024-10-02 02:10:27,722 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 0
2024-10-02 02:10:27,722 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 1
2024-10-02 02:10:27,722 - torch.distributed.distributed_c10d - INFO - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
2024-10-02 02:10:27,722 - torch.distributed.distributed_c10d - INFO - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
args: Namespace(output_path='./evaluation_results/', full_json_dir='/scratch/users/nattapol/VBench/vbench/VBench_full_info.json', videos_path='../video_mp4/', dimension=['subject_consistency', 'background_consistency', 'motion_smoothness', 'dynamic_degree', 'aesthetic_quality', 'imaging_quality'], load_ckpt_from_local=None, read_frame=None, mode='custom_input', prompt='None', prompt_file=None, category=None, imaging_quality_preprocessing_mode='longer')
start evaluation
Evaluation meta data saved to ./evaluation_results/results_2024-10-02-02:10:27_full_info.json
cur_full_info_path: ./evaluation_results/results_2024-10-02-02:10:27_full_info.json
Using cache found in /scratch/users/nattapol/.cache/torch/hub/facebookresearch_dino_main
Using cache found in /scratch/users/nattapol/.cache/torch/hub/facebookresearch_dino_main
2024-10-02 02:10:44,195 - vbench.subject_consistency - INFO - Initialize DINO success
2024-10-02 02:10:44,207 - vbench.subject_consistency - INFO - Initialize DINO success
100%|██████████| 2/2 [00:32<00:00, 16.27s/it]
cur_full_info_path: ./evaluation_results/results_2024-10-02-02:10:27_full_info.json
100%|██████████| 2/2 [00:13<00:00, 6.94s/it]
...
Could you try with only temporal_flickering
dimension just to make sure it's not during the initialization?
I tried torchrun --nproc_per_node=2 --standalone evaluate.py --dimension 'temporal_flickering' --videos_path duck/ --mode=custom_input, it will also ayuck at start evaluation, following is the log:
WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
2024-10-02 12:01:39,292 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 1 2024-10-02 12:01:39,306 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 0 2024-10-02 12:01:39,306 - torch.distributed.distributed_c10d - INFO - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2024-10-02 12:01:39,312 - torch.distributed.distributed_c10d - INFO - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. args: Namespace(output_path='./evaluation_results/', full_json_dir='/data/hli358/han/VBench/vbench/VBench_full_info.json', videos_path='duck/', dimension=['temporal_flickering'], load_ckpt_from_local=None, read_frame=None, mode='custom_input', prompt='None', prompt_file=None, category=None, imaging_quality_preprocessing_mode='longer') start evaluation
Hi,
I want to use VBench with
torch.distributed
for multiprocessing evaluation, however, I found only the first process can finish while all the rest processes cannot successfully finish. Here is the code snip: