DOI-USGS / lake-temperature-process-models

Creative Commons Zero v1.0 Universal
1 stars 4 forks source link

ncdfgeom approach for pulling data from netCDFs #38

Closed hcorson-dosch-usgs closed 2 years ago

hcorson-dosch-usgs commented 2 years ago

Okay - started to dig into how we want to pull the meteo data directly from the netCDF files, instead of using our intermediary feather files, per #33

I explored using the ncdfgeom::read_timeseries_dsg function. If you open a netCDF file with that function you get a list of dataframes, with a dataframe for each variable: image

At first I explored loading in all of the variables for all cells for a single GCM at once. I got an error that the memory requested was too much, so I commented that code out and instead implemented an approach of pulling out the data cell by cell. For each cell, I pull the column pertaining to that cell from each variable dataframe, then split that data by time period and write it to feather files, before moving on to the next cell.

I'm not sure if this is the best approach, both from an efficiency standpoint and from the standpoint that a single call to the function generates so many files.

But I thought I'd commit it so that we could get a convo going. I'm happy to look into non-netcdfgeom approaches tomorrow.

hcorson-dosch-usgs commented 2 years ago

Okay @lindsayplatt - I think I've got our plan described here fully implemented. Ready for review!

hcorson-dosch-usgs commented 2 years ago

Fixed #33