Open shatz01 opened 1 year ago
Describe the bug\ Caching a dataset when running a DDP job breaks.
Solution\ Guard any caching with checking for environment variable os.environ["RANK"].
os.environ["RANK"]
An intermittent fix is simply to first run your training script on 1 gpu, and once it has cached run it DDP with reset_cache=False.
Describe the bug\ Caching a dataset when running a DDP job breaks.
Solution\ Guard any caching with checking for environment variable
os.environ["RANK"]
.