DOI-USGS / dataretrieval-python

Python package for retrieving water data from USGS or the multi-agency Water Quality Portal
https://doi-usgs.github.io/dataretrieval-python/
Other
165 stars 41 forks source link

Equivalent to whatNWISdata from R version #77

Closed ArlexMR closed 1 year ago

ArlexMR commented 1 year ago

I need to filter the sites with a minimum number of records of predefined parameters. Is there any function to get the number of records given the site and parameter codes? I think it is something like 'whatNWISdata' from the R package.

elbeejay commented 1 year ago

Yes, I believe the dataretrieval.nwis.get_info() function is what you are looking for.

As an example, if we want to get information about site "05114000" and we use the R whatNWISdata function, that would look like the following:

data <- dataRetrieval::whatNWISdata(siteNumber='05114000')

The returned data list has the following column names:

> colnames(data)
 [1] "agency_cd"          "site_no"            "station_nm"         "site_tp_cd"        
 [5] "dec_lat_va"         "dec_long_va"        "coord_acy_cd"       "dec_coord_datum_cd"
 [9] "alt_va"             "alt_acy_va"         "alt_datum_cd"       "huc_cd"            
[13] "data_type_cd"       "parm_cd"            "stat_cd"            "ts_id"             
[17] "loc_web_ds"         "medium_grp_cd"      "parm_grp_cd"        "srs_id"            
[21] "access_cd"          "begin_date"         "end_date"           "count_nu"  

Using Python, we can get the same result using the get_info() function:

from dataretrieval import nwis
df, md = nwis.get_info(sites='05114000', seriesCatalogOutput=True)

Note that we have to specify seriesCatalogOutput=True, this is something the R package does automatically, here the user has control over that argument (and the default behavior is False).

If we list the columns in the returned df data frame, they should match what we got using R:

>>> df.columns
Index(['agency_cd', 'site_no', 'station_nm', 'site_tp_cd', 'dec_lat_va',
       'dec_long_va', 'coord_acy_cd', 'dec_coord_datum_cd', 'alt_va',
       'alt_acy_va', 'alt_datum_cd', 'huc_cd', 'data_type_cd', 'parm_cd',
       'stat_cd', 'ts_id', 'loc_web_ds', 'medium_grp_cd', 'parm_grp_cd',
       'srs_id', 'access_cd', 'begin_date', 'end_date', 'count_nu'],
      dtype='object')

Hope that helps.

ArlexMR commented 1 year ago

Thanks! That's just what I was looking for