Closed toanderod closed 1 year ago
Thanks for raising this issue @toanderod - it looks like the datetime information is still available in the dataframe but is not being correctly handled into the datetime
column because the time information (as you've noted) is missing - specifically because it is denoted as a NaN
value. This ends up creating problems when we try processing the datetime information into the datetime
index, as I think the NaN
is getting turned into a NaT
(Not a Time?) value.
Will investigate.
Upon investigation, the function is working as intended for sites with incomplete datetime information. When I run the command you provided as an example @toanderod:
df=nwis.get_record(statecd='sd',service='gwlevels',localAquiferCd='SD:331MDSN', start='1900-01-01',access='3')
I get the following warning:
UserWarning: Warning: 1440 incomplete dates found, consider setting datetime_index to False.
So if you following the suggestion made in the warning, then your query becomes the following:
df=nwis.get_record(statecd='sd',service='gwlevels',localAquiferCd='SD:331MDSN', start='1900-01-01',access='3',datetime_index=False)
And as expected, this switches the indexing for the dataframe and uses integers instead:
>>> df.head()
agency_cd site_no site_tp_cd lev_dt lev_tm lev_tz_cd lev_va sl_lev_va sl_datum_cd lev_status_cd lev_agency_cd lev_dt_acy_cd lev_acy_cd lev_src_cd lev_meth_cd lev_age_cd
0 USGS 431232103513501 GW 1956-09-19 NaN UTC -20.0 NaN NaN 1 NaN D NaN NaN O A
1 USGS 431753103492601 GW 1956-09-19 NaN UTC -204.0 NaN NaN 1 NaN D NaN NaN G A
2 USGS 431753103492601 GW 1957-05-29 NaN UTC -206.0 NaN NaN 1 NaN D NaN NaN G A
3 USGS 431808103492001 GW 1956-09-19 NaN UTC -198.0 NaN NaN 1 NaN D NaN NaN G A
4 USGS 431810103491001 GW 1975-09-26 NaN UTC -219.0 NaN NaN 1 NaN D NaN NaN G A
We are receptive to alternative solutions and suggestions on how to handle this scenario - but as you can see, this situation has been considered. The reasoning for this, as I understand it, is that we do not want to make assumptions about the time a measurement was collected. Our datetime
column requires information to populate a datetime field in the format YYYY-MM-DD HH:MM:SS+HH:MM and when there is no time information, we'd be forced to make some sort of guess or assumption about the time the measurement was taken. Rather than do this, we keep the original columns in the dataframe and give the user the option to instead index the data by integer in the order it was returned by the webservice.
Hope that helps, and again, we are happy to hear alternative ideas or suggestions.
Groundwater level data retrievals result in missing datetimes for levels that have no time associated with the date of measurement. For me, the result was a small fraction of the total available groundwater levels from NWIS being available from dataretrieval.
ex. for the following retrieval 1440 of the 4473 groundwater levels measurements are missing dates: gw_df = nwis.get_record(statecd='sd',service='gwlevels',localAquiferCd='SD:331MDSN', start='1900-01-01',access='3')