Open TomNicholas opened 1 month ago
Related to this, we have to discuss how the python package will access the ERA5 data. Should we have the user download the necessary data from ECMWF themselves separately, or do we want to provide support through our python package? @TomNicholas @sdbachman @matt-long @ubbu36
Step 4 of #1 requires accessing, subsetting, downloading, interpolating, and writing out atmospheric forcing data.
I haven't tried to do this yet so don't know all the details but @sdbachman was asking about how we might parallelize this. If the files to be processed are fully embarrassingly-parallel the simplest way is to use
dask.delayed
and create a list of all individual tasks to be performed. Example notebook that demonstrates that idea (for a tiny fake dataset).Alternatively maybe we want to actually create a whole xarray dataset for all the input data and use dask via calling xarray objects on that dataset.
Ideally we could just fit the whole forcing dataset in memory but I think it's too big for that.
Either way the first step is to express the operations we need to do on the original forcing dataset in xarray code.