Closed aaraney closed 2 years ago
From the stack trace, it appears that the metadata field, scaling_factor
, for the streamflow
variables in one of the NWM's channel_rt output files is not being deserialized as a collection (list, etc.) and instead is just a scalar variable (int, float, etc.). This may have been caused by a downstream change to a dependency (xarray, h5netcdf).
I was able to resolve this issue by removing the index in the scale_factor
object.
line 274 python/nwm_client/src/hydrotools/nwm_client/gcp.py
# Extract scale factor
scale_factor = ds['streamflow'].scale_factor[0]
# fixed with
scale_factor = ds['streamflow'].scale_factor
I am assuming that the metadata layout of NWM channel route link files is pretty static over time as we've not seen this issue before. I assume this is a deserialization issue propagating from, if I had to guess, xarray.
It might be best if we push a hot fix that guards and type checks the scale_factor
field while we track down and figure out what is causing this and determine a long term solution.
Found the issue. It is propagatingh5netcdf
. Today they pushed 0.14.0
which introduced the following per their change log.
Return items from 0-dim and one-element 1-dim array attributes. Return multi-element attributes as lists. Return string attributes as Python strings decoded from their respective encoding (utf-8, ascii). By Kai Mühlbauer.
I verified that rolling the version back to 0.13.0
resolved this issue.
Now as to how we should proceed. I know previously I said:
It might be best if we push a hot fix that guards and type checks the scale_factor field while we track down and figure out what is causing this and determine a long term solution.
In this case, I think it makes sense to just type check ds.streamflow.scale_factor
and handle the case where a scalar is returned. I dont want to force others to comply with a version pinning of h5netcdf. Thoughts @jarq6c?
streamflow = ds['streamflow']
# h5netcdf <= 0.13.0 always deserializes numeric attributes to numpy arrays.
# even if there will only be one item in the array.
if isinstance(streamflow.scale_factor, np.ndarray):
scale_factor = streamflow.scale_factor[0]
# h5netcdf > 0.13.0 deserializes numeric attributes to numpy arrays if there is more than scalar in the attribute.
# otherwise, a scalar numpy value is returned
else:
scale_factor = streamflow.scale_factor
If the source attribute was a single scalar all along and was only returned in a list
because of some conceit of h5netcdf
, I'm inclined to just drop the index and leave it at that. Is there a good reason to continue supporting h5netcdf <= 0.13.0
?
After talking with @jarq6c offline, we came to a solution (please correct me where necessary @jarq6c). Given that h5netcdf==0.14.0
was released on 2022-02-25, we will pin the current version of nwm_client
(5.0.1
) to h5netcdf <= 0.13.0
and release the software as a post release to 5.0.1
. Subsequently, nwm_client==5.0.2
will be released and pin h5netcdf >= 0.14.0
. 5.0.2
will include a patch that resolves complies with h5netcdf >= 0.14.0
.
Justin Hunter reported an issue when trying to retrieve a short range forecast using the
nwm_client
'sgcp.NWMDataService
. I verified that I can reproduce the issue locally.Reproduce