Closed prakhar6sharma closed 1 year ago
@tung-nd has worked on something similar for ERA5. His opinion about this request would be highly valuable.
Yes, for the reasons you point out, it would be more practical if we switch to using pytorch's IterableDataset
. I am currently working on a refactor of our data code, and this switch is one of the things I'm working on.
I'm also considering using TorchData, but the project is still in beta, so I'm hesitant to do so at this time.
I tried DataPipes once but failed so I would suggest using IterableDataset for now
OK. I'll stick with IterableDataset
. Leaving this issue open until I implement this in the refactor.
Hi,
Currently the
ERA5
class is inherited from thetorch.utils.data.dataset
.For tasks involving a lot of input and output variables, it becomes impossible to load them all in the RAM at the same time. Hence, I was wondering if there would be a support for
IterableDataset
. It would support loading only a subpart of the data in the RAM.