barronh / pyrsig

Python interface to RSIG Web API
GNU General Public License v3.0
4 stars 2 forks source link

Data transposed when observations requested from multiple AQS stations to netcdf #4

Closed needham-michael closed 10 months ago

needham-michael commented 10 months ago

Description

For this use-case the temporary pyrsig -> pandas -> xarray work-around (shown in the example below) works fine, but the issue might apply to other situations too.

Version Info:

python: 3.12.0 numpy: 1.26.2 pandas: 2.1.3 pyrsig: 0.7.0 xarray: 2023.11.0

Minimum Example to Reproduce

# ============================================================
# Import Statements
# ============================================================
import pyrsig
import xarray as xr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# ============================================================
# Configure pyrsig api for requests 
# ============================================================

# Box around St. Louis, MO-IL
api = pyrsig.RsigApi(
    bdate='2023-01-01',
    edate='2023-12-31',
    bbox=(-91,38,-89.5,39.5)
)

data_key = 'aqs.ozone'

# --- Request Data Using api.to_netcdf() ---

rsig_xr = api.to_netcdf(key=data_key)
rsig_xr.coords['station'] = [str(x) for x in rsig_xr['station_id'].data]

# --- Request Data Using api.to_dataframe() ---

rsig_pd = api.to_dataframe(key=data_key,parse_dates=True)
rsig_pd = xr.Dataset.from_dataframe(rsig_pd.set_index(['STATION(-)', 'Timestamp(UTC)']))
rsig_pd.coords['Timestamp(UTC)'] = rsig_pd['Timestamp(UTC)'].astype('datetime64[ns]')
rsig_pd.coords['STATION(-)'] = rsig_pd['STATION(-)'].astype(str)

# ============================================================
# Plot MDA8 Ozone 
# ============================================================

fig, [ax1,ax2] = plt.subplots(figsize=(6,6),nrows=2,layout='tight',dpi=200)

rsig_xr['ozone'].rolling(time=8).mean().resample(time='1D').max().plot(
    ax=ax1,
    colors=['green', 'yellow', 'orange', 'red'], 
    levels=[0, 55, 71, 86, 105]
)

rsig_pd['ozone(ppb)'].rolling({'Timestamp(UTC)':8}).mean().resample({'Timestamp(UTC)':'1D'}).max().plot(
    ax=ax2,
    colors=['green', 'yellow', 'orange', 'red'], 
    levels=[0, 55, 71, 86, 105]
)

ax1.set_title("using api.to_netcdf()")
ax2.set_title("using api.to_dataframe()")

Output from this example, which can be compared to EPA tile plots: download

And attributes from the associated netcdf file:

image
needham-michael commented 10 months ago

Plot of hourly data for comparison

cmap = 'inferno'
levels = np.arange(0,91,10)
fig, [ax1,ax2] = plt.subplots(figsize=(6,6),nrows=2,layout='tight',dpi=200)

rsig_xr['ozone'].plot(
    ax=ax1,
    cmap=cmap,
    levels=levels,

)

rsig_pd['ozone(ppb)'].plot(
    ax=ax2,
    cmap=cmap,
    levels=levels,

)

ax1.set_title("using api.to_netcdf()")
ax2.set_title("using api.to_dataframe()")

download

barronh commented 10 months ago

I've tried to distill this to a very small test case. Below is a query to pyrsig via to_dataframe and to_netcdf for just two monitors. The monitors are extremely close to each other and one has missing more data.

import matplotlib.pyplot as plt
import pyrsig

api = pyrsig.RsigApi(
    bdate='2018-08-01T00', edate='2018-08-05T23:59:59',
    bbox=(-97.8, 27.6, -97.3, 28.),
)
fig, axx = plt.subplots(2, 1, figsize=(8, 8))
df = api.to_dataframe('aqs.ozone', parse_dates=True, unit_keys=False)
ds = api.to_netcdf('aqs.ozone')
tdf = df.set_index(['Timestamp', 'STATION'])['ozone'].unstack('STATION')
tdf.plot(ax=axx[0])
ds['ozone'][0].plot(ax=axx[1])
ds['ozone'][1].plot(ax=axx[1])
axx[0].set_xticks(range(0, 120, 24))
axx[0].set_xticklabels([t[5:10] for t in tdf.index.values[range(0, 120, 24)]], rotation=90)
image
barronh commented 10 months ago

Todd Plessel (Thanks Todd!!) updated the RSIG Web Coverage Service (WCS) to fix this. He changed the order of the dimensions of the variable. The order was ('station', 'time') and is now ('time', 'station'). pyrsig caches the files on disk, so your data won't be updated until you delete the *.nc and *.nc.gz files. After that, it will requery the system and get the new dimension order.

After your files are updated, the plots you made will be transposed (i.e., station on the x- instead of y-axis). If you want your plot to stay with station on the y-axis and time on the x-axis, transpose the variable in the plot line.

- rsig_xr['ozone'].plot(ax=ax1,cmap=cmap, levels=levels,)
+ rsig_xr['ozone'].T.plot(ax=ax1,cmap=cmap, levels=levels,)

Please confirm that this is fixed and close the issue.

needham-michael commented 10 months ago

I've confirmed that Todd Plessel's fix to RSIG Web Coverage Service has fixed my issue with to_netcdf() (after clearing the old .nc and .nc.gz files, as you said).

Closing the issue - Thanks!