Closed chadcwilliams closed 3 months ago
I just tested this and it is still the case.
Test by:
Increasing hidden dimension to a large num (e.g., 10,000) as it might just be that the CPU might just be really good at doing this and these results are expected.
Also, confirm the same pattern does not exist in GAN.
The problem persists with large models, so it is not an issue of distributing data across GPUs taking too much time.
I've done a deep dive and everything points towards the AE being a simple model and there's too much overhead in DDP training. In this case, it's really not useful to use DDP for the AE. That is, at least with the default AE structure and the data we've been using.
So, I am just going to add a UserWarning to make sure people don't waste time doing it like this (which I must have wasted a lot of time processing the first dataset, but oh well!)
warnings.warn(f""" The default autoencoder is a small model and DDP training adds a lot of overhead when transferring data to GPUs.
As such, it might be useful to test each GPU and CPU training and see what works best for your use case.
Although DDP training will result in better performance than CPU with the same number of training epochs,
you can achieve this same performance quicker by adding epochs with CPU training.""", stacklevel=2)
Using GPUs via the ddp argument takes twice as long than using the CPU... It should be roughly the same.