canonical / hotsos

Software analysis toolkit. Define checks in high-level language and leverage library to perform analysis of common Cloud applications.
Apache License 2.0
32 stars 38 forks source link

Enhance disk failure detection #790

Closed pponnuvel closed 5 months ago

pponnuvel commented 6 months ago

Currently we detect just "critical medium error" on disks.

We can expand this to cover more errors: https://elixir.bootlin.com/linux/v6.8-rc5/source/block/blk-core.c#L152

by simply enhancing the regex in hotsos/defs/scenarios/kernel/disk_failure.yaml

There's one other scenario in case 379832 and I am sure we can find more examples from sosreports on our file server.