Unidata / MetPy

MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.
https://unidata.github.io/MetPy/
BSD 3-Clause "New" or "Revised" License
1.26k stars 415 forks source link

Support calculation of indices (e.g. precipitable water) using gridded data #1533

Open DanielAdriaansen opened 4 years ago

DanielAdriaansen commented 4 years ago

I have an Xarray Dataset object and it has 3D variables of atmospheric data stored in it. Essentially, a collection of nx X ny columns of data. I am attempting to compute precipitable water using this 3D grid, but have been unsuccessful doing it any other way besides looping in nx and ny, storing the result from precipitable_water() in an np ndarray, and then adding it to my Xarray Dataset object. I tried various flavors of xarray apply_ufunc(), but I don't think I am advanced enough of a user to get it to broadcast the precipitable_water() function across my Dataset correctly. Or maybe it's not even possible.

In MetPy, It would be nice to be able to pass these variables in and get back a 2D nx X ny grid of precipitable_water back into my Dataset, or as a separate DataArray object since the dimensions are reduced from 3D to 2D in this calculation.

Something like: pw_da = mpcalc.precipitable_water(ds['P'],ds['TD'],bottom=pbot,top=ptop)

where pw_da would be a DataArray object since two Dataset objects were passed into the function.

I made the issue generic, since there may be other indices, or calculations that fit this model in MetPy that I haven't thought about where the user would want to distill 3D data into a 2D field of some sort.

After perusing current issues and PR's, I found #1490 , and this may help me but I am not entirely sure.

I originally posted a question on SO awhile back here: https://stackoverflow.com/questions/61255329/pythonic-way-to-compute-precipitable-water, where @dopplershift alluded to some assumed 1D behavior at least in precipitable_water(), so in addition to Xarray-type enhancements there is probably also some 1D to 3D changes needed to achieve this.

ahuang11 commented 4 years ago

I know this doesn't isn't reusable for other calculations, but if 'P' is a 1D dimension of 'TD', maybe you could do: -ds.sel(P=slice(pbot, ptop)).integrate('P') * 1 / rho / g

dopplershift commented 4 years ago

@DanielAdriaansen Thanks for opening. This is definitely something we want to enable now that we have a fairly large set of useful calculations. Not sure on a schedule for that, though...

jthielen commented 4 years ago

Agreed, thanks for opening!

1490 does not resolve this, as of that PR (and the current state of the master branch), precipitable_water still only works on 1D profiles. The main stopping point is MetPy's get_layer functionality, which does not support gridded data. This is a prime candidate to be updated to use xarray in a future release, which would also allow things like SRH (https://github.com/Unidata/MetPy/issues/1258) and bulk shear to permit gridded data, where the only thing stopping them from being vectorized is get_layer. Not that it is directly in the interest of this issue, but CAPE and other parcel profile-dependent calculations, however, would take a bit more work to support gridded data due to the vertical iteration needed (see https://github.com/Unidata/MetPy/issues/480).

ahuang11 commented 4 years ago

I guess another way is to use map blocks (must be chunked with dask) and then perform your loop inside so that it's kind of parallelized.

def xr_precipitable_water(ds):
    da_prws = []
    for lat in ds['lat']:
        for lon in ds['lon']:
            ds_sub = ds.sel(lat=lat, lon=lon)
            ds_sub['prw'] = precipitable_water(
                ds_sub['dpt'].values * units('degC'),
                ds_sub['level'].values * units('hPa')
            ).m
            da_prw = ds_sub['prw']
            da_prws.append(da_prw.expand_dims(['lat', 'lon']))
    da_prw = xr.combine_by_coords(da_prws)
    return da_prw

ds = ds.chunk({'lat': 37, 'lon': 36})
template = ds.isel(level=0, drop=True)[['air']].rename({'air': 'prw'})
ds_prw = xr.map_blocks(xr_precipitable_water, ds, template=template).load()

Executed in 2 mins 6 seconds (tested on my personal machine with 4 cores)

Compared to the absolute serial case

da_prws = []
for lat in ds['lat']:
    for lon in ds['lon']:
        ds_sub = ds.sel(lat=lat, lon=lon)
        ds_sub['prw'] = precipitable_water(
            ds_sub['dpt'].values * units('degC'),
            ds_sub['level'].values * units('hPa')
        ).m
        da_prw = ds_sub['prw']
        da_prws.append(da_prw.expand_dims(['lat', 'lon']))
da_prw = xr.combine_by_coords(da_prws)

Executed in 2 mins 35 seconds

Entire script:

from metpy.calc import precipitable_water, dewpoint_from_relative_humidity
from metpy.units import units
import xarray as xr

ds = xr.open_mfdataset('*.mon.mean.nc').sortby('lat')

ds['dpt'] = (
    ('level', 'lat', 'lon'),
    dewpoint_from_relative_humidity(
        ds['air'].values * units('degC'), ds['rhum'])
)

def xr_precipitable_water(ds):
    da_prws = []
    for lat in ds['lat']:
        for lon in ds['lon']:
            ds_sub = ds.sel(lat=lat, lon=lon)
            ds_sub['prw'] = precipitable_water(
                ds_sub['dpt'].values * units('degC'),
                ds_sub['level'].values * units('hPa')
            ).m
            da_prw = ds_sub['prw']
            da_prws.append(da_prw.expand_dims(['lat', 'lon']))
    da_prw = xr.combine_by_coords(da_prws)
    return da_prw

ds = ds.chunk({'lat': 37, 'lon': 36})
template = ds.isel(level=0, drop=True)[['air']].rename({'air': 'prw'})
ds_prw = xr.map_blocks(xr_precipitable_water, ds, template=template).load()

da_prws = []
for lat in ds['lat']:
    for lon in ds['lon']:
        ds_sub = ds.sel(lat=lat, lon=lon)
        ds_sub['prw'] = precipitable_water(
            ds_sub['dpt'].values * units('degC'),
            ds_sub['level'].values * units('hPa')
        ).m
        da_prw = ds_sub['prw']
        da_prws.append(da_prw.expand_dims(['lat', 'lon']))
da_prw = xr.combine_by_coords(da_prws)