About training settings

aharley / pips

Particle Video Revisited

MIT License

568 stars 51 forks source link

About training settings #32

Closed HarryHsing closed 1 year ago

HarryHsing commented 1 year ago

Hi, may I know how many CPUs you used to train the model with the setting "B=4, hori_flip=True, vert_flip = True, N=768, I=4"?

Thanks a lot for your attention!

aharley commented 1 year ago

I believe I used a p3.16xlarge instance on AWS, which has 64 vCPUs.

HarryHsing commented 1 year ago

I believe I used a p3.16xlarge instance on AWS, which has 64 vCPUs.

Thanks for the information, Dr. Harley! Besides, could you tell me how long it took to train your model with this setting?

HarryHsing commented 1 year ago

Also, I've been facing a problem that increaing"num_workers" does not speed up the data loading on a server with slurm.

For example:

When num_workers was set to 1 and the batch size was 1, it took around 10 seconds to load the data.
When num_workers was set to 4 and the batch size was 1, it took around 40 seconds to load the data.

It seems that the workers cannot run in parallel. I wonder if you ever met the same issue? Thanks!

HarryHsing commented 1 year ago

Also, I've been facing a problem that increaing"num_workers" does not speed up the data loading on a server with slurm.

For example:

When num_workers was set to 1 and the batch size was 1, it took around 10 seconds to load the data.

When num_workers was set to 4 and the batch size was 1, it took around 40 seconds to load the data.

It seems that the workers cannot run in parallel. I wonder if you ever met the same issue? Thanks!

Based on my findings, this is caused by "self.add_occluders" in the flyingthingsdataset.py. This part cannot be run with multiple workers in parallel.

HarryHsing commented 1 year ago

Problem solved by adding "multiprocessing_context='spawn'" to the dataloader

aharley commented 1 year ago

Hi Harry, can you actually give a bit more info here, so that I can make sure to update the repo with your suggested fix? Where do you put that argument? Thanks, Adam