canonical / hotsos

Software analysis toolkit. Define checks in high-level language and leverage library to perform analysis of common Cloud applications.
Apache License 2.0
30 stars 37 forks source link

searchkit doesn't handle unicode errors #860

Open pponnuvel opened 2 months ago

pponnuvel commented 2 months ago
2024-05-10 12:12:15,355 791905 ERROR searchkit [-] caught UnicodeDecodeError while searching ./var/log/kern.log
Traceback (most recent call last):
  File "/home/pponnuvel/.local/pipx/venvs/hotsos/lib/python3.8/site-packages/searchkit/search.py", line 1043, in execute
    fd.read(1)
  File "/usr/lib/python3.8/gzip.py", line 292, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.8/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/usr/lib/python3.8/gzip.py", line 479, in read
    if not self._read_gzip_header():
  File "/usr/lib/python3.8/gzip.py", line 427, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'Ap')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pponnuvel/.local/pipx/venvs/hotsos/lib/python3.8/site-packages/searchkit/search.py", line 1048, in execute
    stats = self._run_search(fd)
  File "/home/pponnuvel/.local/pipx/venvs/hotsos/lib/python3.8/site-packages/searchkit/search.py", line 993, in _run_search
    line = line.decode("utf-8", **self.decode_kwargs)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 117: invalid continuation byte
2024-05-10 12:12:15,363 791878 DEBUG searchkit [-] joining/stopping queue consumer thread
2024-05-10 12:12:15,434 791878 DEBUG searchkit [-] exiting results thread
2024-05-10 12:12:15,434 791878 DEBUG searchkit [-] stopped fetching results (total received=0)
2024-05-10 12:12:15,434 791878 DEBUG searchkit [-] consumer thread stopped successfully
2024-05-10 12:12:15,437 791878 ERROR hotsos.plugin.lxd [-] part 'auto_scenario_check' raised exception: 'utf-8' codec can't decode byte 0xd2 in position 117: invalid continuation byte
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/pponnuvel/.local/pipx/venvs/hotsos/lib/python3.8/site-packages/searchkit/search.py", line 1043, in execute
    fd.read(1)
  File "/usr/lib/python3.8/gzip.py", line 292, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.8/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/usr/lib/python3.8/gzip.py", line 479, in read
    if not self._read_gzip_header():
  File "/usr/lib/python3.8/gzip.py", line 427, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'Ap')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/pponnuvel/.local/pipx/venvs/hotsos/lib/python3.8/site-packages/searchkit/search.py", line 1048, in execute
    stats = self._run_search(fd)
  File "/home/pponnuvel/.local/pipx/venvs/hotsos/lib/python3.8/site-packages/searchkit/search.py", line 993, in 
      line = line.decode("utf-8", **self.decode_kwargs)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 117: invalid continuation byte
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/pponnuvel/.local/pipx/venvs/hotsos/lib/python3.8/site-packages/hotsos/core/plugintools.py", line 402, in run
    always_parts().run()
  File "/home/pponnuvel/.local/pipx/venvs/hotsos/lib/python3.8/site-packages/hotsos/core/ycheck/scenarios.py", line 142, in run
    self.load()
  File "/home/pponnuvel/.local/pipx/venvs/hotsos/lib/python3.8/site-packages/hotsos/core/ycheck/scenarios.py", line 99, in load
    results = self.searcher.run()
  File "/home/pponnuvel/.local/pipx/venvs/hotsos/lib/python3.8/site-packages/searchkit/search.py", line 1446, in run
    self._run_mp(mgr, results, rs)
  File "/home/pponnuvel/.local/pipx/venvs/hotsos/lib/python3.8/site-packages/searchkit/search.py", line 1396, in _run_mp
    self.stats.update(future.result())
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 117: invalid continuation byte

/customers/sncf/00382281/sosreport-ht202opp01-SNCF-00382281-2024-04-16-rmraxob.tar.xz has the problematic kern.log.

mustafakemalgilor commented 2 months ago

It looks like there are some characters from the ANSI charset in the kern.log file, which does not play nicely with the UTF-8 decoding. I think there are two possible solution paths here:

a-) skip the offending line b-) Use a fallback decoder (e.g. cp1252)

pponnuvel commented 2 months ago

It looks like there are some characters from the ANSI charset in the kern.log file, which does not play nicely with the UTF-8 decoding. I think there are two possible solution paths here:

a-) skip the offending line b-) Use a fallback decoder (e.g. cp1252)

Yeah, searchkit's maintainer actually handled it by providing an option to ignore it :) I've used it: #861.