Closed micahjohnson150 closed 5 years ago
Here are the results from difference all the variables that were casted as different data.
x 0.00000 0.00000 0.00000 0.00000
y 0.00000 0.00000 0.00000 0.00000
time 0.00000 0.00000 0.00000 0.00000
latitude 0.00000 0.00000 0.00000 -0.00000
longitude 0.00000 0.00001 0.00002 -0.00002
I would call this a success.
Original grib2 file was 119Mb with this script the final file is 28MB, a 91MB savings (~800GB for an entire year!).
The reason we are reducing the number of variables is that for snow modeling effort, we do not need a majority of the atmospheric variables. So the 6 or so that we do need will be extracted and put into a netcdf, allowing for a smaller and more manageable file.
Changing of the datatypes in the netcdf does not affect the variables, just some of the dimension variables. For example, latitude
and longitude
are cast as doubles, something that isn't needed.
What branch is this on?
We are moving things over to the cloud computing platforms. To do this we need to create object storage of the grib files. Unfortunately the grib files are harder to serve up for the object storage to be used so conversion to netcdf is preferable. Even if the conversion was straight across in memory the amount will be too significant so some data reduction is necessary.
@scotthavens feel free to add or alter info from this.
Things we should do to reduce data: