earthobservations / wetterdienst

Open weather data for humans.
https://wetterdienst.readthedocs.io/
MIT License
358 stars 55 forks source link

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte #232

Closed amotl closed 3 years ago

amotl commented 3 years ago

Describe the bug After installing wetterdienst within a fresh virtualenv directly from the Git repository and trying to acquire data using its command line interface, it croaks. This is coming from trying to reproduce the issue #230 @wetterfrosch is observing.

To reproduce

virtualenv .venv --python=python3.8
source .venv/bin/activate
pip install git+https://github.com/earthobservations/wetterdienst#egg=wetterdienst[influxdb]
wetterdienst dwd readings --parameter=air_temperature --resolution=10_minutes --period=recent --station=1048,4411 --date=2020-10/2020-11

Expected behavior Installing and invoking wetterdienst should just work (TM).

Screenshots C'mon ;].

Version information OS: macOS

$ wetterdienst --version
wetterdienst 0.10.1
$ python -V
Python 3.8.6
$ hostinfo
Mach kernel version:
     Darwin Kernel Version 17.7.0: Thu Jun 18 21:21:34 PDT 2020; root:xnu-4570.71.82.5~1/RELEASE_X86_64
Kernel configured for up to 8 processors.
4 processors are physically available.
8 processors are logically available.
Processor type: x86_64h (Intel x86-64h Haswell)
Processors active: 0 1 2 3 4 5 6 7
Primary memory available: 16.00 gigabytes
Default processor set: 344 tasks, 1974 threads, 8 processors
Load average: 1.35, Mach factor: 6.64

Additional context Maybe it's related to the dogpile cache again? See also #217.

Full traceback

$ wetterdienst dwd readings --parameter=air_temperature --resolution=10_minutes --period=recent --station=1048,4411 --date=2020-10/2020-11
2020-11-20 20:46:23,913 [wetterdienst.dwd.observations.api] INFO   : Acquiring observations data for air_temperature/10_minutes/recent/station_id_1048.
2020-11-20 20:46:23,915 [wetterdienst.cli              ] ERROR  : 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
Traceback (most recent call last):
  File "/Users/amo/dev/earthobservations/tmp/.venv/lib/python3.8/site-packages/wetterdienst/cli.py", line 255, in run
    df = observations.collect_safe()
  File "/Users/amo/dev/earthobservations/tmp/.venv/lib/python3.8/site-packages/wetterdienst/dwd/observations/api.py", line 339, in collect_safe
    data = list(self.collect_data())
  File "/Users/amo/dev/earthobservations/tmp/.venv/lib/python3.8/site-packages/wetterdienst/dwd/observations/api.py", line 214, in collect_data
    df_parameter = self._collect_data(station_id, parameter_set)
  File "/Users/amo/dev/earthobservations/tmp/.venv/lib/python3.8/site-packages/wetterdienst/dwd/observations/api.py", line 288, in _collect_data
    df_period = collect_climate_observations_data(
  File "/Users/amo/dev/earthobservations/tmp/.venv/lib/python3.8/site-packages/wetterdienst/dwd/observations/access.py", line 69, in collect_climate_observations_data
    remote_files = create_file_list_for_climate_observations(
  File "/Users/amo/dev/earthobservations/tmp/.venv/lib/python3.8/site-packages/wetterdienst/dwd/observations/fileindex.py", line 40, in create_file_list_for_climate_observations
    file_index = create_file_index_for_climate_observations(
  File "<decorator-gen-1>", line 2, in create_file_index_for_climate_observations
  File "/Users/amo/dev/earthobservations/tmp/.venv/lib/python3.8/site-packages/dogpile/cache/region.py", line 1563, in get_or_create_for_user_func
    return self.get_or_create(
  File "/Users/amo/dev/earthobservations/tmp/.venv/lib/python3.8/site-packages/dogpile/cache/region.py", line 1028, in get_or_create
    with Lock(
  File "/Users/amo/dev/earthobservations/tmp/.venv/lib/python3.8/site-packages/dogpile/lock.py", line 185, in __enter__
    return self._enter()
  File "/Users/amo/dev/earthobservations/tmp/.venv/lib/python3.8/site-packages/dogpile/lock.py", line 87, in _enter
    value = value_fn()
  File "/Users/amo/dev/earthobservations/tmp/.venv/lib/python3.8/site-packages/dogpile/cache/region.py", line 963, in get_value
    value = self._get_from_backend(key)
  File "/Users/amo/dev/earthobservations/tmp/.venv/lib/python3.8/site-packages/dogpile/cache/region.py", line 1250, in _get_from_backend
    return self._parse_serialized_from_backend(
  File "/Users/amo/dev/earthobservations/tmp/.venv/lib/python3.8/site-packages/dogpile/cache/region.py", line 1207, in _parse_serialized_from_backend
    metadata = json.loads(bytes_metadata)
  File "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/__init__.py", line 343, in loads
    s = s.decode(detect_encoding(s), 'surrogatepass')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
amotl commented 3 years ago

When purging the cache using

rm -r ~/Library/Caches/wetterdienst

it started working again.

Unfortunately, the location of the cache directory is not printed on startup, so the average user will have no clue what is going on. This logging command was intentionally placed here, but apparently the logger has not been configured at this time already.

https://github.com/earthobservations/wetterdienst/blob/5c8f5897e0b27439774ac68720a31141c1ef3403/wetterdienst/util/cache.py#L22

Why the cache got corrupted at all is also beyond my current knowledge. It's just that it has already tripped @sk-drop the other day, see #217.