aditya-grover / climate-learn

Source code for ClimateLearn
MIT License
310 stars 49 forks source link

Support for IterableDataset #38

Closed prakhar6sharma closed 1 year ago

prakhar6sharma commented 1 year ago

Hi,

Currently the ERA5 class is inherited from the torch.utils.data.dataset.

For tasks involving a lot of input and output variables, it becomes impossible to load them all in the RAM at the same time. Hence, I was wondering if there would be a support for IterableDataset. It would support loading only a subpart of the data in the RAM.

prakhar6sharma commented 1 year ago

@tung-nd has worked on something similar for ERA5. His opinion about this request would be highly valuable.

jasonjewik commented 1 year ago

Yes, for the reasons you point out, it would be more practical if we switch to using pytorch's IterableDataset. I am currently working on a refactor of our data code, and this switch is one of the things I'm working on.

I'm also considering using TorchData, but the project is still in beta, so I'm hesitant to do so at this time.

tung-nd commented 1 year ago

I tried DataPipes once but failed so I would suggest using IterableDataset for now

jasonjewik commented 1 year ago

OK. I'll stick with IterableDataset. Leaving this issue open until I implement this in the refactor.