PeizeSun / TransTrack

Multiple Object Tracking with Transformer
MIT License
629 stars 109 forks source link

DistributedVideoSampler IndexError #62

Closed imhgchoi closed 2 years ago

imhgchoi commented 2 years ago

Hi, I'm trying to train with "sh track_exps/crowdhuman_mot_trainhalf.sh" on MOT20 with the pretrained model "crowdhuman_final.pth" My GPU env is 8 RTX 3090's, but I'm keep getting the below error. Anyone with the same issue? Thanks

Traceback (most recent call last):
  File "main_track.py", line 390, in <module>
    main(args)
  File "main_track.py", line 195, in main
    sampler_val = DistributedVideoSampler(dataset_val, start_id=args.start_id, shuffle=False)
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in __init__
    split_flags = [c[0] for c in chunks]
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in <listcomp>
    split_flags = [c[0] for c in chunks]
IndexError: index 0 is out of bounds for axis 0 with size 0
Traceback (most recent call last):
  File "main_track.py", line 390, in <module>
    main(args)
  File "main_track.py", line 195, in main
    sampler_val = DistributedVideoSampler(dataset_val, start_id=args.start_id, shuffle=False)
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in __init__
    split_flags = [c[0] for c in chunks]
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in <listcomp>
    split_flags = [c[0] for c in chunks]
IndexError: index 0 is out of bounds for axis 0 with size 0
Done (t=2.44s)
creating index...
Traceback (most recent call last):
  File "main_track.py", line 390, in <module>
    main(args)
  File "main_track.py", line 195, in main
    sampler_val = DistributedVideoSampler(dataset_val, start_id=args.start_id, shuffle=False)
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in __init__
    split_flags = [c[0] for c in chunks]
  File "/home/hyeongkyu/projects/TransTrack/datasets/sampler_video_distributed.py", line 41, in <listcomp>
    split_flags = [c[0] for c in chunks]
IndexError: index 0 is out of bounds for axis 0 with size 0
PeizeSun commented 2 years ago

Hi~ You are using 8 GPUs, but MOT20 has less than 8 videos, so some GPUs have no input. You could try to reduce the number of GPUs to no larger than the number of videos.

imhgchoi commented 2 years ago

Wow, that was fast :+1: I see, I'll try it out. Thank you so much :)

imhgchoi commented 2 years ago

The training works fine with 4 GPU's, but now I'm having trouble with evaluation. The evaluation phase ends after 100 steps, and wouldn't proceed further and freezes.

Test:  [  0/829]  eta: 0:10:56    time: 0.7915  data: 0.4873  max mem: 8853
Test:  [ 10/829]  eta: 0:04:03    time: 0.2967  data: 0.0471  max mem: 8853
Test:  [ 20/829]  eta: 0:03:37    time: 0.2429  data: 0.0032  max mem: 8853
Test:  [ 30/829]  eta: 0:03:26    time: 0.2367  data: 0.0033  max mem: 8853
Test:  [ 40/829]  eta: 0:03:19    time: 0.2364  data: 0.0034  max mem: 8853
Test:  [ 50/829]  eta: 0:03:13    time: 0.2334  data: 0.0035  max mem: 8853
Test:  [ 60/829]  eta: 0:03:08    time: 0.2286  data: 0.0034  max mem: 8853
Test:  [ 70/829]  eta: 0:03:03    time: 0.2241  data: 0.0032  max mem: 8853
Test:  [ 80/829]  eta: 0:02:59    time: 0.2208  data: 0.0033  max mem: 8853
Test:  [ 90/829]  eta: 0:02:55    time: 0.2253  data: 0.0034  max mem: 8853
Test:  [100/829]  eta: 0:02:52    time: 0.2261  data: 0.0033  max mem: 8853
Test: Total time: 0:00:25 (0.0305 s / it)

I wonder if anyone encountered such a phenomenon? Thanks again

PeizeSun commented 2 years ago

This is a display bug(number of images in different GPUs are different). Actually the program is still running.

imhgchoi commented 2 years ago

Woops my bad. Seems that aggregation takes a lot of time. Evaluation works perfectly fine.

Thanks a lot Mr. Sun