DOI-USGS / dataretrieval-python

Python package for retrieving water data from USGS or the multi-agency Water Quality Portal
https://doi-usgs.github.io/dataretrieval-python/
Other
166 stars 41 forks source link

CRS handling for `nwis` module #170

Open ehinman opened 1 week ago

ehinman commented 1 week ago

It was noted by @aaraney, @lstanish-usgs, Mike Mahoney, @thodson-usgs and others that hard-coding a CRS for NWIS sites is not ideal. While NAD83 is the most common CRS projection, it is not the exclusive one used to document the location of all gages, example: https://waterservices.usgs.gov/nwis/site/?format=rdb&sites=483554104034801&siteOutput=expanded. At the very least, we should warn users of this inconsistency, at a more comprehensive level, we could use the provided datum for each site from the downloaded dataset to set the projection before converting to a unified projection like WGS84.

aaraney commented 1 week ago

While NAD83 is the most common CRS projection, it is not the exclusive one used to document the location of all gages, example: https://waterservices.usgs.gov/nwis/site/?format=rdb&sites=483554104034801&siteOutput=expanded.

@ehinman, I think we all might have been a little overhasty.

TL;DR - dataretrieval.nwis uses dec_lat_va and dec_long_va fields to construct a GeoDataFrame's geometry column. dec_coord_datum_cd is the crs of dec_lat_va and dec_long_va. dec_coord_datum_cd is only ever empty or NAD83.

Full story:

The waterservices api reports up to two pairs of latitude and longitude. lat_va, long_va in units of degrees minutes seconds (DMS) and dec_lat_va, dec_long_va in units of decimal degrees. Likewise, the associated crs for the lat long pair, if known, is given in the coord_datum_cd for DMS and dec_coord_datum_cd for the decimal degrees coordinates.

#  lat_va          -- DMS latitude
#  long_va         -- DMS longitude
#  dec_lat_va      -- Decimal latitude
#  dec_long_va     -- Decimal longitude
#  ...
#  coord_datum_cd  -- Latitude-longitude datum            <---
#  dec_coord_datum_cd -- Decimal Latitude-longitude datum <---

dataretrieval-python's nwis module uses the decimal degree pair to construct the geometry column. See here: https://github.com/DOI-USGS/dataretrieval-python/blob/3ba0c83ae6aa9f71ba053cb8b8347da1241b575d/dataretrieval/nwis.py#L84

After doing a little digging, it seems that only the coord_datum_cd changes (the one dataretrieval.nwis is not using). dec_coord_datum_cd is only ever empty or NAD83. For example, https://waterservices.usgs.gov/nwis/site/?format=rdb&sites=483554104034801&siteOutput=expanded does have a coord_datum_cd of WGS84 however it does not have a dec_coord_datum_cd; It is empty.

To check this I wrote up the following small script:

from pprint import pprint
import numpy as np
from dataretrieval import nwis

# State or Territory list from:
# https://waterservices.usgs.gov/test-tools/?service=site&siteType=&statTypeCd=all&major-filters=sites&format=rdb&date-type=type-none&statReportType=daily&statYearType=calendar&missingData=off&siteStatus=all&siteNameMatchOperator=start
cds = ['al', 'ak', 'aq', 'az', 'ar', '96', 'ca', 'co', 'ct', 'de', 'dc', '62', 'fl', 'ga', 'gu', 'hi', 'id', 'il', 'in', 'ia', '67', 'ks', 'ky', 'la', 'me', 'md', 'ma', 'mi', '71', 'mn', 'ms', 'mo', 'mt', 'ne', 'nv', 'nh', 'nj', 'nm', 'ny', 'nc', 'nd', 'mp', 'oh', 'ok', 'or', 'pa', 'pr', 'ri', '73', 'sc', 'sd', '74', 'tn', 'tx', '75', '76', '77', 'ut', 'vt', 'vi', 'va', '79', 'wa', 'wv', 'wi', 'wy']

dec_datums = {}
for cd in cds:
    try:
        df, _ = nwis.get_info(stateCd=cd)
    except BaseException as e:
        print(f"{cd} failed with {e}")
        continue
    dec_datums[cd] = df["dec_coord_datum_cd"].unique().tolist()

for datums in dec_datums.values():
    for datum in datums:
        assert datum in ("NAD83", np.nan)

pprint(dec_datums)
output ```python 62 failed with Page Not Found Error. May be the result of an empty query. URL: https://waterservices.usgs.gov/nwis/site?stateCd=62&siteOutput=Expanded&format=rdb 67 failed with Page Not Found Error. May be the result of an empty query. URL: https://waterservices.usgs.gov/nwis/site?stateCd=67&siteOutput=Expanded&format=rdb 71 failed with Page Not Found Error. May be the result of an empty query. URL: https://waterservices.usgs.gov/nwis/site?stateCd=71&siteOutput=Expanded&format=rdb 73 failed with Page Not Found Error. May be the result of an empty query. URL: https://waterservices.usgs.gov/nwis/site?stateCd=73&siteOutput=Expanded&format=rdb 74 failed with Page Not Found Error. May be the result of an empty query. URL: https://waterservices.usgs.gov/nwis/site?stateCd=74&siteOutput=Expanded&format=rdb 75 failed with Page Not Found Error. May be the result of an empty query. URL: https://waterservices.usgs.gov/nwis/site?stateCd=75&siteOutput=Expanded&format=rdb 76 failed with Page Not Found Error. May be the result of an empty query. URL: https://waterservices.usgs.gov/nwis/site?stateCd=76&siteOutput=Expanded&format=rdb 77 failed with Page Not Found Error. May be the result of an empty query. URL: https://waterservices.usgs.gov/nwis/site?stateCd=77&siteOutput=Expanded&format=rdb 79 failed with Page Not Found Error. May be the result of an empty query. URL: https://waterservices.usgs.gov/nwis/site?stateCd=79&siteOutput=Expanded&format=rdb {'96': ['NAD83', nan], 'ak': ['NAD83', nan], 'al': ['NAD83', nan], 'aq': ['NAD83'], 'ar': ['NAD83', nan], 'az': ['NAD83', nan], 'ca': ['NAD83', nan], 'co': ['NAD83', nan], 'ct': ['NAD83', nan], 'dc': ['NAD83', nan], 'de': ['NAD83', nan], 'fl': ['NAD83', nan], 'ga': ['NAD83', nan], 'gu': ['NAD83'], 'hi': ['NAD83', nan], 'ia': ['NAD83', nan], 'id': ['NAD83', nan], 'il': ['NAD83', nan], 'in': ['NAD83', nan], 'ks': ['NAD83', nan], 'ky': ['NAD83', nan], 'la': ['NAD83', nan], 'ma': ['NAD83', nan], 'md': ['NAD83', nan], 'me': ['NAD83', nan], 'mi': ['NAD83', nan], 'mn': ['NAD83', nan], 'mo': ['NAD83', nan], 'mp': ['NAD83'], 'ms': ['NAD83', nan], 'mt': ['NAD83', nan], 'nc': ['NAD83', nan], 'nd': ['NAD83', nan], 'ne': ['NAD83', nan], 'nh': ['NAD83', nan], 'nj': ['NAD83', nan], 'nm': ['NAD83', nan], 'nv': ['NAD83', nan], 'ny': ['NAD83', nan], 'oh': ['NAD83', nan], 'ok': ['NAD83', nan], 'or': ['NAD83', nan], 'pa': ['NAD83', nan], 'pr': ['NAD83', nan], 'ri': ['NAD83', nan], 'sc': ['NAD83', nan], 'sd': ['NAD83', nan], 'tn': ['NAD83', nan], 'tx': ['NAD83', nan], 'ut': ['NAD83', nan], 'va': ['NAD83', nan], 'vi': ['NAD83'], 'vt': ['NAD83', nan], 'wa': ['NAD83', nan], 'wi': ['NAD83', nan], 'wv': ['NAD83', nan], 'wy': ['NAD83', nan]} ```