Closed HarryHsing closed 1 year ago
I believe I used a p3.16xlarge instance on AWS, which has 64 vCPUs.
I believe I used a p3.16xlarge instance on AWS, which has 64 vCPUs.
Thanks for the information, Dr. Harley! Besides, could you tell me how long it took to train your model with this setting?
Also, I've been facing a problem that increaing"num_workers" does not speed up the data loading on a server with slurm.
For example:
It seems that the workers cannot run in parallel. I wonder if you ever met the same issue? Thanks!
Also, I've been facing a problem that increaing"num_workers" does not speed up the data loading on a server with slurm.
For example:
- When num_workers was set to 1 and the batch size was 1, it took around 10 seconds to load the data.
- When num_workers was set to 4 and the batch size was 1, it took around 40 seconds to load the data.
It seems that the workers cannot run in parallel. I wonder if you ever met the same issue? Thanks!
Based on my findings, this is caused by "self.add_occluders" in the flyingthingsdataset.py. This part cannot be run with multiple workers in parallel.
Problem solved by adding "multiprocessing_context='spawn'" to the dataloader
Hi Harry, can you actually give a bit more info here, so that I can make sure to update the repo with your suggested fix? Where do you put that argument? Thanks, Adam
Hi, may I know how many CPUs you used to train the model with the setting "B=4, hori_flip=True, vert_flip = True, N=768, I=4"?
Thanks a lot for your attention!