Deltares / ddlpy

API to Dutch Rijkswaterstaat archive (DDL, waterinfo.rws.nl) of monitoring water data
https://deltares.github.io/ddlpy/
GNU General Public License v3.0
20 stars 6 forks source link

Avoid formatting warning in `_combine_waarnemingenlijst()` #112

Closed veenstrajelmer closed 2 months ago

veenstrajelmer commented 2 months ago

Description

When retrieving data for kenmerkendewaarden, some extremes resulted in warnings about an unknown time format:

INFO:kenmerkendewaarden.data_retrieve:retrieving meas data (extremes=True) from DDL for DENOVBTN to measurements_wl_18700101_20240101
 39%|███▉      | 60/154 [01:22<02:18,  1.48s/it]C:\Users\veenstra\Anaconda3\envs\dfm_tools_env\Lib\site-packages\ddlpy\ddlpy.py:286: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  df["time"] = pd.to_datetime(df["Tijdstip"])
 91%|█████████ | 140/154 [03:19<00:20,  1.45s/it]C:\Users\veenstra\Anaconda3\envs\dfm_tools_env\Lib\site-packages\ddlpy\ddlpy.py:286: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  df["time"] = pd.to_datetime(df["Tijdstip"])
100%|██████████| 154/154 [03:39<00:00,  1.43s/it]
 39%|███▉      | 60/154 [01:21<02:18,  1.47s/it]C:\Users\veenstra\Anaconda3\envs\dfm_tools_env\Lib\site-packages\ddlpy\ddlpy.py:286: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  df["time"] = pd.to_datetime(df["Tijdstip"])
 91%|█████████ | 140/154 [03:20<00:20,  1.46s/it]C:\Users\veenstra\Anaconda3\envs\dfm_tools_env\Lib\site-packages\ddlpy\ddlpy.py:286: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  df["time"] = pd.to_datetime(df["Tijdstip"])
100%|██████████| 154/154 [03:40<00:00,  1.43s/it]

INFO:kenmerkendewaarden.data_retrieve:retrieving meas data (extremes=True) from DDL for STAVNSE to measurements_wl_18700101_20240101
 96%|█████████▌| 148/154 [03:13<00:08,  1.45s/it]C:\Users\veenstra\Anaconda3\envs\dfm_tools_env\Lib\site-packages\ddlpy\ddlpy.py:286: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  df["time"] = pd.to_datetime(df["Tijdstip"])
100%|██████████| 154/154 [03:21<00:00,  1.31s/it]
 96%|█████████▌| 148/154 [03:16<00:09,  1.63s/it]C:\Users\veenstra\Anaconda3\envs\dfm_tools_env\Lib\site-packages\ddlpy\ddlpy.py:286: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  df["time"] = pd.to_datetime(df["Tijdstip"])
100%|██████████| 154/154 [03:25<00:00,  1.34s/it]

INFO:kenmerkendewaarden.data_retrieve:retrieving meas data (extremes=True) from DDL for TERNZN to measurements_wl_18700101_20240101
 85%|████████▌ | 131/154 [03:09<00:33,  1.44s/it]C:\Users\veenstra\Anaconda3\envs\dfm_tools_env\Lib\site-packages\ddlpy\ddlpy.py:286: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  df["time"] = pd.to_datetime(df["Tijdstip"])
100%|██████████| 154/154 [03:42<00:00,  1.44s/it]
 85%|████████▌ | 131/154 [03:13<00:34,  1.49s/it]C:\Users\veenstra\Anaconda3\envs\dfm_tools_env\Lib\site-packages\ddlpy\ddlpy.py:286: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  df["time"] = pd.to_datetime(df["Tijdstip"])
100%|██████████| 154/154 [03:46<00:00,  1.47s/it]

What I Did

A reproducible example for these stations/years gave the warnings at first, but later they were not showing up anymore.

from kenmerkendewaarden.data_retrieve import retrieve_catalog
import pandas as pd
import dateutil
import ddlpy
import logging

logging.basicConfig() # calling basicConfig is essential to set logging level for sub-modules
logging.getLogger("kenmerkendewaarden").setLevel(level="INFO")
logger = logging.getLogger(__name__)

idx_warning_dict = {"DENOVBTN":[60,140],
                    "STAVNSE":[148],
                    "TERNZN":[131],
                    }
_, locs_meas_ext, locs_meas_exttype = retrieve_catalog()

for station in idx_warning_dict.keys():
    idx_warning_list = idx_warning_dict[station]
    for idx_warning in idx_warning_list:
        year_one = 1870 + idx_warning
        start_date = pd.Timestamp(year_one, 1, 1)
        end_date = pd.Timestamp(year_one, 2, 1)

        logger.warning(f"retrieving extremes from DDL for {station} for year {year_one}")

        bool_station_ext = locs_meas_ext.index.isin([station])
        bool_station_exttype = locs_meas_exttype.index.isin([station])
        loc_meas_ext_one = locs_meas_ext.loc[bool_station_ext]
        loc_meas_exttype_one = locs_meas_exttype.loc[bool_station_exttype]

        freq = dateutil.rrule.YEARLY

        measurements = ddlpy.measurements(
            location=loc_meas_ext_one.iloc[0],
            start_date=start_date,
            end_date=end_date,
            freq=freq,
        )

        # convert extreme type to HWLWcode add extreme type and HWLcode as dataset variables
        # TODO: simplify by retrieving the extreme value and type from ddl in a single request: https://github.com/Rijkswaterstaat/wm-ws-dl/issues/19
        measurements_exttyp = ddlpy.measurements(
            location=loc_meas_exttype_one.iloc[0],
            start_date=start_date,
            end_date=end_date,
            freq=freq,
        )

Possible solution It seemed that adding the correct format avoided the warnings:

df["time"] = pd.to_datetime(df["Tijdstip"], format='ISO8601')

However, after checking this, the warning could not be reproduced anymore. It seems useful and safe to add it anyway.

veenstrajelmer commented 2 months ago

Won't do since the warning cannot be reproduced anymore