blaylockbk / goes2go

Download and process GOES-16 and GOES-17 data from NOAA's archive on AWS using Python.
https://goes2go.readthedocs.io/
MIT License
187 stars 32 forks source link

Smarter way to plot a time range? #88

Open guidocioni opened 5 months ago

guidocioni commented 5 months ago

Hey, nice library! Plotting is easy with the rgb accessor when the data downloaded is just one timestep. However, it is not clear to me what is the easiest way to use this accessor when downloading multiple files.

For example, downloading a timerange and forcing to return an xarray dataset (same behaviour as when downloading a single timestep)

G = GOES(satellite=16,
         product="ABI-L2-MCMIP",
         domain='C').timerange(
             start='2024-04-08 17:00',
             end='2024-04-08 19:00',
             return_as='xarray'
         )

obviously creates a really large dataset with high memory consumption (about 16GB stored in RAM in this case).

Afterwards, I can plot by accessing the rgb attributes of individual timesteps, i.e. G.isel(t=0).rgb. However, I feel like a better option would be not to load all the timesteps into memory at once but instead use lazy evaluation (maybe with Dask) so that every file is only loaded into memory once the data needs to be plotted. Is this possible?

Or is there a better way to only read file by file and attach the rgb accessor afterwars? I couldn't find any example in the documentation.

guidocioni commented 5 months ago

Ok, I just realized the rgb accessor gets attached even when opening the dataset with xarray so this is not really an issue. Still, I think it should be mentioned in the examples somewhere.

I think the preferred way to do this would be

from goes2go import GOES
import matplotlib.pyplot as plt
import xarray as xr
from goes2go import config

G = GOES(satellite=16,
         product="ABI-L2-MCMIP",
         domain='C').timerange(
             start='2024-04-08 17:00',
             end='2024-04-08 20:00',
             return_as='filelist' # instead than dataset
         )

and then

ds = xr.open_mfdataset([str(config['timerange']['save_dir']) + '/' + f for f in G['file'].to_list()],
                  concat_dim='t',
                  combine='nested')

This creates a xarray dataset with delayed objects without loading everything into memory.

And then you can plot a single timestep by doing

fig = plt.figure(figsize=(15, 15))
ax = plt.subplot(projection=ds.isel(t=-1).rgb.crs)
ax.imshow(ds.isel(t=-1).rgb.NaturalColor(gamma=.9),
          **ds.isel(t=-1).rgb.imshow_kwargs)
ax.coastlines(color='white', linewidth=.5)

which should only load the relevant timestep into memory at once.