Closed Senwang98 closed 2 years ago
hello, have you solved this problem? I got the same
Sorry, I have no idea....
So the problem stems from this line:
inference_loader = DataLoader(inference_dataset, batch_size=args.effective_inference_batch_size, shuffle=False, **inf_gpuargs)
Where:
args.effective_batch_size = args.batch_size * args.number_gpus
If I have 59 usable frames, if number of gpu is set to 2, the length of inference_loader is 29, if number of gpu is set to 1, it comes out as 59. The workaround is to only use 1 gpu, still not sure where the bug is, as it seems to be going through a Torch class... At the very least this is why you're getting data skipped.
solved this issue doing similar thing that @kyle-sama did... but still can't under stand mechanism clearly... THX @kyle-sama
When I use multi-gpu to inference, the output of tqdm seems to be wrong!
tqdm outputs for single GPU: Inference Averages for Epoch 0: L1: 8.475, EPE: 14.679: 100%|█████| 50/50.0 [00:22<00:00, 2.53it/s] tqdm outputs for 2 GPU: Inference Averages for Epoch 0: L1: 8.475, EPE: 14.679: 100%|█████| 25/25.0 [00:25<00:00, 2.15it/s]
How to sovle this question when using multi-gpu?