Open ivezakis opened 2 years ago
Hi, @ivezakis. You are using one process to load 8 images, so it will be 8 times slower. This is expected. To make it faster, you should use a num_workers
larger than 1.
Hi @fepegar, in fact I am using the maximum number of workers for my machine in the dataloader, num_workers = 12. Sorry that wasn't accurate on the code I provided.
Please consider re-opening this. The difference is rather large in my experience. For a batch size of 8, it is over 40 times. Picture attached.
Edit: Also tried it with batch size one, it's 6.8 seconds vs 3.6.
Yes,I have meet the same problem with yours. it is very very slow!(at least 30 times than actually model traning time) but I don't have good ways to resolve it. Have you get any good method?
hi can it be that you ask for too much numworker ? making num_worker equal to the number of core, may be too much (overload can really decrease performance). can you try different numworker (1/2 1/4 of you total num_worker) and report if you get the same difference ? (do not forget, as fepegar said, both are equivalent if time_dataloader = batch_size * time_dataset (because in dataset you get only a batch_size of 1)
@ivezakis, @QingYunA
Can you please provide a minimal, reproducible example?
@romainVala I've also noticed that behavior. For example, in a DGX with 40 cores, my code was fastest using only 12.
hi can it be that you ask for too much numworker ? making num_worker equal to the number of core, may be too much (overload can really decrease performance). can you try different numworker (1/2 1/4 of you total num_worker) and report if you get the same difference ? (do not forget, as fepegar said, both are equivalent if time_dataloader = batch_size * time_dataset (because in dataset you get only a batch_size of 1)
yes, after i increase the num_workers(16) of Queue, the speed of preparing dataloader get fast. By the way, i found the transform i used influence the speed. when i remove the RandomAffine(degrees=20), load time reduce half.
Is there an existing issue for this?
Problem summary
When using SubjectsDataset with PyTorch dataloader, iterating over the dataloader is incredibly slow. Naturally, this slows training down as well. When iterating over the SubjectsDataset however, it is significantly faster.
In my experience, starting to iterate over Subjects dataset takes a few seconds (<10), while for dataloader to begin, it takes more than a minute.
Code for reproduction
Actual outcome
Iterating over loader is much more slow than iterating over subjects.
Error messages
No response
Expected outcome
Performance should be similar.
System info