ShardDataset doesn't work for DDP

Update on this.

I have spent almost a week on this and there is some deadlock condition happening when converting the xarray data to torch.tensor(See this). I just can't figure out a way to fix this for now.

Currently DDP works for multiple gpu only if you set num_workers in the DataLoader to 0. DDP_spawn works although for multi-gpu and with num_workers > 0 but the slight performance improvement of using multiple workers is quickly overshadowed by the performance degradement of using .spawn() for process creation.

For my purposes having zero workers with multiple gpu work perfectly as it also temporarily avoids #89 (due to just one main process which is stateful).

aditya-grover / climate-learn

ShardDataset doesn't work for DDP #91