Unidata / MetPy

MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.
https://unidata.github.io/MetPy/
BSD 3-Clause "New" or "Revised" License
1.26k stars 415 forks source link

errors='ignore' for parse_cf? #1917

Open andhuang-CLGX opened 3 years ago

andhuang-CLGX commented 3 years ago
import os
from io import BytesIO

import s3fs
import metpy
import xarray as xr

fs = s3fs.S3FileSystem(anon=True)
fs.ls('s3://noaa-goes16/')

files = fs.ls('noaa-goes16/ABI-L1b-RadC/2019/240/00/')
with fs.open(files[0], 'rb') as f:
    ds = xr.open_dataset(BytesIO(f.read()), engine='h5netcdf')

ds.metpy.parse_cf()

If I do this, it crashes, but if I do this, it works:

import os
from io import BytesIO

import s3fs
import metpy
import xarray as xr

fs = s3fs.S3FileSystem(anon=True)
fs.ls('s3://noaa-goes16/')

files = fs.ls('noaa-goes16/ABI-L1b-RadC/2019/240/00/')
with fs.open(files[0], 'rb') as f:
    ds = xr.open_dataset(BytesIO(f.read()), engine='h5netcdf')

ds.drop(["x_image", "y_image"]).metpy.parse_cf()
andhuang-CLGX commented 3 years ago

Actually, still doesn't work: ds["Rad"].metpy.cartopy_crs Proj4Error: Error from proj: b'major axis or radius = 0 or not given'

andhuang-CLGX commented 3 years ago

I was able to recreate the issue. metpy may need to call .item() if the len(attrs["something"]) == 1

proj_info = ds['goes_imager_projection'].attrs
globe = ccrs.Globe(
    semimajor_axis=proj_info["semi_major_axis"].item(),
    semiminor_axis=proj_info["semi_minor_axis"].item()
)
geo_crs = ccrs.Geostationary(
    central_longitude=proj_info["longitude_of_projection_origin"].item(),
    satellite_height=proj_info["perspective_point_height"].item(),
    globe=globe,
)
jthielen commented 3 years ago

So it looks there are a few different issues going on here!

1) parse_cf() should be more error safe

When I used this file downloaded from S3 rather than BytesIO reading, this example seems to fail with TypeError: unsupported operand type(s) for -: 'ParserHelper' and 'int', something very unhelpful for the end user. This (along with a recent stackoverflow question) makes me think that more robust error handling inside parse_cf, particularly with regards to units, is in order.

2) A UDUNITS-valid expression failed our tweaked pint registry

In this dataset, the variable kappa0 has has a units attribute of (W m-2 um-1)-1 which I believe to be valid UDUNITS format, but something that our regex below failed to capture:

https://github.com/Unidata/MetPy/blob/c7b0b9b9db3b895dbc38fc231412d3b1c6baa2fd/src/metpy/units.py#L35-L39

While that regex borders on "write-only code", I'll take a wild guess and hope the following modification may do the trick?

r'(?<=[A-Za-z\)])(?![A-Za-z\)])(?<![0-9\-][eE])(?<![0-9\-])(?=[0-9\-])'

(addition of \) to the look-behinds)

xref https://github.com/Unidata/MetPy/issues/1362

3) The Proj4Error

I wasn't able to replicate this using the local netcdf file and default netcdf engine, so I think this may be something weird with the h5netcdf engine. Due to time constraints, I'll let someone else chime in on that one.