Deltares / ddlpy

API to Dutch Rijkswaterstaat archive (DDL, waterinfo.rws.nl) of monitoring water data
https://deltares.github.io/ddlpy/
GNU General Public License v3.0
20 stars 6 forks source link

prevent removing too much duplicates #53

Closed veenstrajelmer closed 8 months ago

veenstrajelmer commented 8 months ago

Description

Apparently, .drop_duplicates() does not consider the index in deciding whether a row is unique. We removed the Tijdstap column from the dataframe in https://github.com/openearth/ddlpy/issues/51. We now have to do this after dropping the duplicates, instead of before.

TODO:

What I Did

import ddlpy
import datetime as dt

locations = ddlpy.locations()
location = locations[(locations['Grootheid.Code'] == 'WATHTE') &
                     (locations['Groepering.Code'] == 'NVT')].loc['DENHDR']
start_date = dt.datetime(2014, 1, 1)
end_date = dt.datetime(2014, 1, 7)
measurements_clean = ddlpy.measurements(location, start_date=start_date, end_date=end_date, clean_df=True)
measurements_raw = ddlpy.measurements(location, start_date=start_date, end_date=end_date, clean_df=False)
print()
print(len(measurements_clean))
print(len(measurements_raw))

measurements_clean.plot(y="Meetwaarde.Waarde_Numeriek")
measurements_raw.plot(y="Meetwaarde.Waarde_Numeriek")

clean plot (220measurements): image

raw plot (865 measurements): image