Open ehinman opened 1 week ago
While NAD83 is the most common CRS projection, it is not the exclusive one used to document the location of all gages, example: https://waterservices.usgs.gov/nwis/site/?format=rdb&sites=483554104034801&siteOutput=expanded.
@ehinman, I think we all might have been a little overhasty.
TL;DR - dataretrieval.nwis
uses dec_lat_va
and dec_long_va
fields to construct a GeoDataFrame's geometry
column. dec_coord_datum_cd
is the crs of dec_lat_va
and dec_long_va
. dec_coord_datum_cd
is only ever empty or NAD83
.
Full story:
The waterservices
api reports up to two pairs of latitude and longitude. lat_va
, long_va
in units of degrees minutes seconds (DMS) and dec_lat_va
, dec_long_va
in units of decimal degrees. Likewise, the associated crs for the lat long pair, if known, is given in the coord_datum_cd
for DMS and dec_coord_datum_cd
for the decimal degrees coordinates.
# lat_va -- DMS latitude
# long_va -- DMS longitude
# dec_lat_va -- Decimal latitude
# dec_long_va -- Decimal longitude
# ...
# coord_datum_cd -- Latitude-longitude datum <---
# dec_coord_datum_cd -- Decimal Latitude-longitude datum <---
dataretrieval-python's nwis module uses the decimal degree pair to construct the geometry column. See here: https://github.com/DOI-USGS/dataretrieval-python/blob/3ba0c83ae6aa9f71ba053cb8b8347da1241b575d/dataretrieval/nwis.py#L84
After doing a little digging, it seems that only the coord_datum_cd
changes (the one dataretrieval.nwis
is not using). dec_coord_datum_cd
is only ever empty or NAD83
. For example, https://waterservices.usgs.gov/nwis/site/?format=rdb&sites=483554104034801&siteOutput=expanded does have a coord_datum_cd
of WGS84
however it does not have a dec_coord_datum_cd
; It is empty.
To check this I wrote up the following small script:
from pprint import pprint
import numpy as np
from dataretrieval import nwis
# State or Territory list from:
# https://waterservices.usgs.gov/test-tools/?service=site&siteType=&statTypeCd=all&major-filters=sites&format=rdb&date-type=type-none&statReportType=daily&statYearType=calendar&missingData=off&siteStatus=all&siteNameMatchOperator=start
cds = ['al', 'ak', 'aq', 'az', 'ar', '96', 'ca', 'co', 'ct', 'de', 'dc', '62', 'fl', 'ga', 'gu', 'hi', 'id', 'il', 'in', 'ia', '67', 'ks', 'ky', 'la', 'me', 'md', 'ma', 'mi', '71', 'mn', 'ms', 'mo', 'mt', 'ne', 'nv', 'nh', 'nj', 'nm', 'ny', 'nc', 'nd', 'mp', 'oh', 'ok', 'or', 'pa', 'pr', 'ri', '73', 'sc', 'sd', '74', 'tn', 'tx', '75', '76', '77', 'ut', 'vt', 'vi', 'va', '79', 'wa', 'wv', 'wi', 'wy']
dec_datums = {}
for cd in cds:
try:
df, _ = nwis.get_info(stateCd=cd)
except BaseException as e:
print(f"{cd} failed with {e}")
continue
dec_datums[cd] = df["dec_coord_datum_cd"].unique().tolist()
for datums in dec_datums.values():
for datum in datums:
assert datum in ("NAD83", np.nan)
pprint(dec_datums)
It was noted by @aaraney, @lstanish-usgs, Mike Mahoney, @thodson-usgs and others that hard-coding a CRS for NWIS sites is not ideal. While NAD83 is the most common CRS projection, it is not the exclusive one used to document the location of all gages, example: https://waterservices.usgs.gov/nwis/site/?format=rdb&sites=483554104034801&siteOutput=expanded. At the very least, we should warn users of this inconsistency, at a more comprehensive level, we could use the provided datum for each site from the downloaded dataset to set the projection before converting to a unified projection like WGS84.