Unidata / MetPy

MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.
https://unidata.github.io/MetPy/
BSD 3-Clause "New" or "Revised" License
1.24k stars 413 forks source link

ValueError with metpy.quantify on NAM 212 Dataset #2195

Open sgdecker opened 2 years ago

sgdecker commented 2 years ago

What went wrong?

I attempted to use metpy.quantify on the NAM 212 dataset returned by Unidata's TDS, but it didn't like one of the variables. There are 180 variables in the dataset, and I haven't tracked down which one is problematic.

Output:

Traceback (most recent call last):
  File "/home/decker/classes/met433/new_labs/nam_bug.py", line 28, in <module>
    nam = nam.metpy.quantify()
  File "/home/decker/src/git_repos/metpy/src/metpy/xarray.py", line 1002, in quantify
    return self._dataset.map(lambda da: da.metpy.quantify())
  File "/home/decker/local/miniconda3/envs/devel21/lib/python3.9/site-packages/xarray/core/dataset.py", line 5108, in map
    variables = {
  File "/home/decker/local/miniconda3/envs/devel21/lib/python3.9/site-packages/xarray/core/dataset.py", line 5109, in <dictcomp>
    k: maybe_wrap_array(v, func(v, *args, **kwargs))
  File "/home/decker/src/git_repos/metpy/src/metpy/xarray.py", line 1002, in <lambda>
    return self._dataset.map(lambda da: da.metpy.quantify())
  File "/home/decker/src/git_repos/metpy/src/metpy/xarray.py", line 211, in quantify
    quantified_dataarray = self._data_array.copy(data=self.unit_array)
  File "/home/decker/src/git_repos/metpy/src/metpy/xarray.py", line 157, in unit_array
    return units.Quantity(self._data_array.data, self.units)
  File "/home/decker/src/git_repos/metpy/src/metpy/xarray.py", line 134, in units
    return units.parse_units(self._data_array.attrs.get('units', 'dimensionless'))
  File "/home/decker/local/miniconda3/envs/devel21/lib/python3.9/site-packages/pint/registry.py", line 1093, in parse_units
    units = self._parse_units(input_string, as_delta, case_sensitive)
  File "/home/decker/local/miniconda3/envs/devel21/lib/python3.9/site-packages/pint/registry.py", line 1306, in _parse_units
    return super()._parse_units(input_string, as_delta, case_sensitive)
  File "/home/decker/local/miniconda3/envs/devel21/lib/python3.9/site-packages/pint/registry.py", line 1116, in _parse_units
    raise ValueError("Unit expression cannot have a scaling factor.")
ValueError: Unit expression cannot have a scaling factor.

Operating System

Linux

Version

1.1.0.post39+gd43d5eb02.d20210825

Python Version

3.9.5

Code to Reproduce

from datetime import datetime

import xarray as xr
from xarray.backends import NetCDF4DataStore
from siphon.catalog import TDSCatalog
import metpy

def get_nam212(init_time, valid_time):
    ymd = init_time.strftime('%Y%m%d')
    hr = init_time.strftime('%H')
    filename = f'{ymd}_{hr}00.grib2'
    ds_name = 'NAM_CONUS_40km_conduit_' + filename
    cat_name = ('https://thredds.ucar.edu/thredds/catalog/grib/NCEP/NAM/'
                'CONUS_40km/conduit/' + ds_name + '/catalog.xml')

    cat = TDSCatalog(cat_name)
    ds = cat.datasets[ds_name]
    ncss = ds.subset()
    query = ncss.query()
    query.time(valid_time).variables('all')
    nc = ncss.get_data(query)
    data = xr.open_dataset(NetCDF4DataStore(nc)).metpy.parse_cf()
    return data

init_time = datetime(2021, 11, 8, 0)
plot_time = datetime(2021, 11, 8, 12)
nam = get_nam212(init_time, plot_time)
nam = nam.metpy.quantify()
jthielen commented 2 years ago

I would argue that this is not a MetPy bug, as this dataset contains several variables with units that don't make any sense in the context of CF/UDUNITS:

varname unit
Soil_type_surface (Code table 4.213)
Categorical_Rain_surface Code table 4.222
Categorical_Freezing_Rain_surface Code table 4.222
Categorical_Ice_Pellets_surface Code table 4.222
Categorical_Snow_surface Code table 4.222
Rime_Factor_hybrid non-dim
Drag_Coefficient_surface non-dim
Convective_Cloud_Efficiency_entire_atmosphere_single_layer non-dim
Volumetric_Soil_Moisture_Content_depth_below_surface_layer Fraction
Vegetation_Type_surface Integer.(0-13)
Wilting_Point_surface Fraction
Solar_parameter_in_canopy_conductance_surface Fraction
Temperature_parameter_in_canopy_conductance_surface Fraction
Humidity_parameter_in_canopy_conductance_surface Fraction
Soil_moisture_parameter_in_canopy_conductance_surface Fraction
Number_of_Soil_Layers_in_Root_Zone_surface non-dim

Since this dataset isn't following CF conventions even though it declares itself to be, what would be the appropriate upstream fix on the TDS side?

That all being said, as an enhancement, I think MetPy could raise a more informative error in place of Pint's ValueError...perhaps something like Variable "Soil_type_surface" has invalid units attribute "(Code table 4.213)". Verify that all units attributes are valid prior to calling quantify(). ? (Or even better, just match how pint-xarray does it: https://github.com/xarray-contrib/pint-xarray/blob/main/pint_xarray/accessors.py#L1034-L1052)

sgdecker commented 2 years ago

Yes, those are some sketchy units! Maybe MetPy issues warnings for these invalid units and at least returns a Dataset containing the variables that are valid?

jthielen commented 2 years ago

Maybe MetPy issues warnings for these invalid units and at least returns a Dataset containing the variables that are valid?

That could work too! I guess it's the choice between failing out and letting the user handle it or assuming the user is okay with variables with invalid units being dropped.

dopplershift commented 2 years ago

I'd be ok with making it an opt-in flag to ignore invalid units on parse_cf, and any other needed methods. Could be added as a keyword-only argument to maintain current behavior. An improved error exception would also be good to have.