earthobservations / wetterdienst

Open weather data for humans.
https://wetterdienst.readthedocs.io/
MIT License
349 stars 54 forks source link

Writing MOSMIX forecast data to InfluxDB croaks #295

Closed wetterfrosch closed 3 years ago

wetterfrosch commented 3 years ago

Hi everyone again!

-- EDIT: This was tested within v0.11.1, see output of v0.12 in the next post! --

still on my way to try every other feature-combination now and then I just tried to directly write the new MOSMIX-observation-readings (thanks @gutzbenj! :) to InfluxDB. I don't know if this is supposed to be already possible at all -- but JFYI in case you wonder, too if that works: Nope ;) when I try, I get:

$ wetterdienst dwd forecasts readings --tidy --mosmix-type=large --station=10382 --parameter=TTT --target="influxdb://localhost:8086/?database=dwd&table=mosmix"
2020-12-23 13:21:17,880 [wetterdienst.dwd.forecasts.access] INFO   : Downloading KMZ file MOSMIX_L_LATEST_10382.kmz
https://opendata.dwd.de/weather/local_forecasts/mos/MOSMIX_L/single_stations/10382/kml/MOSMIX_L_LATEST_10382.kmz: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18.4k/18.4k [00:00<00:00, 39.8MiB/s]
2020-12-23 13:21:17,938 [wetterdienst.dwd.forecasts.access] INFO   : Parsing KML data
2020-12-23 13:21:17,969 [wetterdienst.cli              ] INFO   : Writing data to target influxdb://localhost:8086/?database=mosmix&table=mosmix
2020-12-23 13:21:17,970 [wetterdienst.util.pandas      ] INFO   : Writing to InfluxDB ('dwd', 'mosmix')
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2898, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'date'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/bin/wetterdienst", line 10, in <module>
    sys.exit(run())
  File "/home/wtf/.local/lib/python3.9/site-packages/wetterdienst/cli.py", line 288, in run
    df.io.export(options.target)
  File "/home/wtf/.local/lib/python3.9/site-packages/wetterdienst/util/pandas.py", line 166, in export
    df = self.df.set_index(pd.DatetimeIndex(self.df["date"]))
  File "/usr/lib/python3.9/site-packages/pandas/core/frame.py", line 2906, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/usr/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2900, in get_loc
    raise KeyError(key) from err
KeyError: 'date'

I've no clue; usually in an InfluxDB-context I would ask at "oh, something with the date": "is this really unix-nanoseconds since 1970?". But as this is already occurring within pandas it seems to me that maybe pandas can't already access its values?

Outputting the values on the CLI as JSON or CSV, with or without --tidy, works fine! Even with a 💩pile of parameters! :)

gutzbenj commented 3 years ago

Dear @wetterfrosch , thank you for the feedback! This may be related to the column naming - originally I used to name all columns UPPERCASE so that @amotl had introduced a function to make column names lowercase. I'll have a look how we can fix this.

wetterfrosch commented 3 years ago

While typing this issue @gutzbenj just released v0.12 of wetterdienst. :) The release-notes stated something about dates within MOSMIX, so the output of this attempt changed! Now the error is about the quality-field, which is something we have in the observations but not within the forecasts.

with --tidy:

$ wetterdienst dwd forecasts readings --tidy --mosmix-type=large --station=10382 --parameter=TTT --target="influxdb://localhost:8086/?database=dwd&table=mosmix"
2020-12-23 13:37:34,165 [wetterdienst.dwd.forecasts.access] INFO   : Downloading KMZ file MOSMIX_L_LATEST_10382.kmz
https://opendata.dwd.de/weather/local_forecasts/mos/MOSMIX_L/single_stations/10382/kml/MOSMIX_L_LATEST_10382.kmz: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18.4k/18.4k [00:00<00:00, 50.6MiB/s]
2020-12-23 13:37:34,220 [wetterdienst.dwd.forecasts.access] INFO   : Parsing KML data
2020-12-23 13:37:34,289 [wetterdienst.cli              ] INFO   : Writing data to target influxdb://localhost:8086/?database=dwd&table=mosmix
2020-12-23 13:37:34,289 [wetterdienst.util.pandas      ] INFO   : Writing to InfluxDB ('dwd', 'mosmix')
Traceback (most recent call last):
  File "/usr/bin/wetterdienst", line 10, in <module>
    sys.exit(run())
  File "/home/wtf/.local/lib/python3.9/site-packages/wetterdienst/cli.py", line 282, in run
    df.io.export(options.target)
  File "/home/wtf/.local/lib/python3.9/site-packages/wetterdienst/util/pandas.py", line 195, in export
    c.write_points(
  File "/home/wtf/.local/lib/python3.9/site-packages/influxdb/_dataframe_client.py", line 91, in write_points
    points = self._convert_dataframe_to_lines(
  File "/home/wtf/.local/lib/python3.9/site-packages/influxdb/_dataframe_client.py", line 390, in _convert_dataframe_to_lines
    tag_df = dataframe[tag_columns]
  File "/usr/lib/python3.9/site-packages/pandas/core/frame.py", line 2912, in __getitem__
    indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
  File "/usr/lib/python3.9/site-packages/pandas/core/indexing.py", line 1254, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
  File "/usr/lib/python3.9/site-packages/pandas/core/indexing.py", line 1304, in _validate_read_indexer
    raise KeyError(f"{not_found} not in index")
KeyError: "['quality', 'parameter_set'] not in index"

without --tidy:

$ wetterdienst dwd forecasts readings --mosmix-type=large --station=10382 --parameter=TTT --target="influxdb://localhost:8086/?database=dwd&table=mosmix"
2020-12-23 13:40:12,151 [wetterdienst.dwd.forecasts.access] INFO   : Downloading KMZ file MOSMIX_L_LATEST_10382.kmz
https://opendata.dwd.de/weather/local_forecasts/mos/MOSMIX_L/single_stations/10382/kml/MOSMIX_L_LATEST_10382.kmz: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18.4k/18.4k [00:00<00:00, 49.4MiB/s]
2020-12-23 13:40:12,208 [wetterdienst.dwd.forecasts.access] INFO   : Parsing KML data
2020-12-23 13:40:12,269 [wetterdienst.cli              ] INFO   : Writing data to target influxdb://localhost:8086/?database=dwd&table=mosmix
2020-12-23 13:40:12,269 [wetterdienst.util.pandas      ] INFO   : Writing to InfluxDB ('dwd', 'mosmix')
Traceback (most recent call last):
  File "/usr/bin/wetterdienst", line 10, in <module>
    sys.exit(run())
  File "/home/wtf/.local/lib/python3.9/site-packages/wetterdienst/cli.py", line 282, in run
    df.io.export(options.target)
  File "/home/wtf/.local/lib/python3.9/site-packages/wetterdienst/util/pandas.py", line 195, in export
    c.write_points(
  File "/home/wtf/.local/lib/python3.9/site-packages/influxdb/_dataframe_client.py", line 91, in write_points
    points = self._convert_dataframe_to_lines(
  File "/home/wtf/.local/lib/python3.9/site-packages/influxdb/_dataframe_client.py", line 390, in _convert_dataframe_to_lines
    tag_df = dataframe[tag_columns]
  File "/usr/lib/python3.9/site-packages/pandas/core/frame.py", line 2912, in __getitem__
    indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
  File "/usr/lib/python3.9/site-packages/pandas/core/indexing.py", line 1254, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
  File "/usr/lib/python3.9/site-packages/pandas/core/indexing.py", line 1304, in _validate_read_indexer
    raise KeyError(f"{not_found} not in index")
KeyError: "['quality'] not in index"
amotl commented 3 years ago

Hi there,

thanks for reporting this. I have a patch on my working tree which might improve things on the lower-case-column-side again.

However, this has been made before @gutzbenj brought in the great OOP refactoring. I will see what I can do for you.

Cheers, Andreas.

amotl commented 3 years ago

Ah. The reason for

KeyError: "['quality', 'parameter_set'] not in index"
KeyError: "['quality'] not in index"

is that the InfluxDB export adapter uses tags.

Now the error is about the quality-field, which is something we have in the observations but not within the forecasts.

Those tags are apparently still hardcoded to work upon synoptic observation data. Shall we just make a patch which doesn't use any tags on MOSMIX forecast data if that would actually be possible?

amotl commented 3 years ago

As a temporary workaround, I've advised @wetterfrosch to make some amendments at

https://github.com/earthobservations/wetterdienst/blob/13c96e82eb973b3ee23e15d0c74b0743e67c630b/wetterdienst/util/pandas.py#L185-L192

So take out "quality" in line 187 and simply comment out lines 191+192.

He says it works. Thanks!

amotl commented 3 years ago

Hi again,

378 will also take care of this issue.

With kind regards, Andreas.