canonical / hotsos

Software analysis toolkit. Define checks in high-level language and leverage library to perform analysis of common Cloud applications.
Apache License 2.0
33 stars 38 forks source link

[core] Fix decode errors from searchkit #861

Closed pponnuvel closed 4 months ago

pponnuvel commented 6 months ago

When UTF-8 decoding fails, searchkit throws an exception.

Passing decode_errors='backslashreplace' to cope with that.

Fixes #860.

xmkg commented 6 months ago

@pponnuvel For the files that import FileSearcher from hotsos.core.search, i.e. from hotsos.core.search import (FileSearcher), we can pass decode_errors directly to the super().__init__ :

https://github.com/canonical/hotsos/blob/4121edf5a5a9c964215e8bd35d471ac702612e89/hotsos/core/search.py#L24-L27

pponnuvel commented 6 months ago

@pponnuvel For the files that import FileSearcher from hotsos.core.search, i.e. from hotsos.core.search import (FileSearcher), we can pass decode_errors directly to the super().__init__ :

https://github.com/canonical/hotsos/blob/4121edf5a5a9c964215e8bd35d471ac702612e89/hotsos/core/search.py#L24-L27

Ah, I didn't realise there's another FileSearcher in hotsos itself! So that would simplify this change. But is there any situation where we do want to see the decode errors and/or handle it? I don't envisage any - just wonder as this woud prevent such things.

pponnuvel commented 6 months ago

@dosaboy https://github.com/pponnuvel/hotsos/commit/97b7d0505232edcec6320376f3caadb48dccfa1e is the other way. But I've a question.

dosaboy commented 6 months ago

I had had issues in the past when using decode error handlers since the injected characters can yield unexpected results from the constraints logic. Having said that I think with the current implementation is reasonably safe. I have however in recent times very seldom comes across this kind of error, would you be able to share anything about what file/output it was that failed? I wonder if a more selective approach would be better i.e. not doing this globally.

pponnuvel commented 6 months ago

@dosaboy Changing these two is sufficient for the specific sosreport I've issues with:

 hotsos/core/plugins/kernel/kernlog/common.py
 hotsos/core/plugins/lxd/common.py           

So we can limit to just to these two too.

dosaboy commented 6 months ago

As a "safer" alternative what do you think about putting this behind a config option and making it optional e.g. --handle-utf8-decode-errors. Overkill?

pponnuvel commented 6 months ago

As a "safer" alternative what do you think about putting this behind a config option and making it optional e.g. --handle-utf8-decode-errors. Overkill?

That's an option. But you need to know it first to ignore it. What you'd see is "failed-parts" in hotsos output which could be due to anything - not necessarily because "decode-errors".

Another question is what could be "unsafe" if we ignore it globally. Do we expect only UTF-8 and ignore anything else? or do we want to support multiple encodings?

It depends on what we could do when we hit decode errors . If the answer is "nothing as we only "grep" for ASCII text in all log/config files, so ignore it" then I'd prefer the global change (https://github.com/pponnuvel/hotsos/commit/97b7d0505232edcec6320376f3caadb48dccfa1e).

pponnuvel commented 5 months ago

@dosaboy Can we get this in? I hit this with every sosreport from a user...