ESIPFed / esiphub-dev

Development JupyterHub on AWS targeting pangeo environment for National Water Model exploration
MIT License
2 stars 1 forks source link

Create sample Zarr dataset on S3 with NWS data #1

Closed rsignell-usgs closed 6 years ago

rsignell-usgs commented 6 years ago

Write 80GB or so of NWS data to Zarr format on S3. This should be sufficient for initial testing and demos.

rsignell-usgs commented 6 years ago

I've written a good chunk of NWM data to Zarr on S3 in bucket rsignell/nwm/test04, with variables in the 60-100GB range:

xarray.Dataset>
Dimensions:         (reference_time: 961, time: 961, x: 4608, y: 3840)
Coordinates:
  * reference_time  (reference_time) datetime64[ns] 2018-03-02 ...
  * time            (time) datetime64[ns] 2018-03-02T01:00:00 ...
  * x               (x) float64 -2.304e+06 -2.303e+06 -2.302e+06 -2.301e+06 ...
  * y               (y) float64 -1.92e+06 -1.919e+06 -1.918e+06 -1.917e+06 ...
Data variables:
    LWDOWN          (time, y, x) float64 dask.array<shape=(961, 3840, 4608), chunksize=(1, 3840, 4608)>
    PSFC            (time, y, x) float64 dask.array<shape=(961, 3840, 4608), chunksize=(1, 3840, 4608)>
    Q2D             (time, y, x) float64 dask.array<shape=(961, 3840, 4608), chunksize=(1, 3840, 4608)>
    RAINRATE        (time, y, x) float32 dask.array<shape=(961, 3840, 4608), chunksize=(1, 3840, 4608)>
    SWDOWN          (time, y, x) float64 dask.array<shape=(961, 3840, 4608), chunksize=(1, 3840, 4608)>
    T2D             (time, y, x) float64 dask.array<shape=(961, 3840, 4608), chunksize=(1, 3840, 4608)>
    U2D             (time, y, x) float64 dask.array<shape=(961, 3840, 4608), chunksize=(1, 3840, 4608)>
    V2D             (time, y, x) float64 dask.array<shape=(961, 3840, 4608), chunksize=(1, 3840, 4608)>

Here's a plot of the mean temperature: download

computed by this notebook: https://gist.github.com/rsignell-usgs/a55c5d825467e8ce118462e8a39965ad

We could use some more cores! 😜

rsignell-usgs commented 6 years ago

BTW, the python script I used to write this is here:

https://gist.github.com/rsignell-usgs/df7b936f28f2212f80872a7f30098680

My AWS credentials to write to this bucket were stored in ~/.aws/config