canonical / hotsos

Software analysis toolkit. Define checks in high-level language and leverage library to perform analysis of common Cloud applications.
Apache License 2.0
33 stars 38 forks source link

Add huge pages check #801

Closed wilkmar closed 7 months ago

wilkmar commented 7 months ago

The patch adds logic for testing if there are too many huge pages defined in the system, which are not being fully utilised, but occupy memory which can't be used for any other purpose. Such situations can negatively impact performance or even lead to the out of memory situations.

This patch also introduces a small refactoring of the VMStat class in the hotsos/core/plugins/kernel/memory.py, in order to avoid code duplication (between VMStat and MemInfo classes).

pponnuvel commented 7 months ago

A question for @dosaboy.

IIRC, the purpose of hotsos/defs/tests/scenarios is to replace unit tests for scenarios as much as possible. Had I done this patch myself, I'd have done a copy-from-original of /proc/meminfo in hotsos/defs/tests/scenarios/kernel/hugepages.yaml (no mock stuff) and not bothered with any changes in tests/unit/test_kernel.py. Do we still need to add unit tests for scenarios in tests/unit/ as Marcin's done here?

dosaboy commented 7 months ago

@pponnuvel yes that's correct you can rely entirely on unit tests written in yaml. The way this is written is totally fine though but as you say if the hugepages.yaml test were to use copy-from-original to copy across some real data root files, adding in overrides for the test then that would obviate the need for the mocking in this case and that would have the added benefit of the underlying code being tested as well (since it would not be mocked). I'll let @wilkmar have final call on that but yes I would recommend this approach.

wilkmar commented 7 months ago

@dosaboy @pponnuvel updated test scenario .yaml according to your suggestions. Thanks for your feedback.

dosaboy commented 7 months ago

@wilkmar you could go the extra step and have a "pass" and "fail" test i.e. rename your test to hugepages_fail.yaml (putting target-name: hugepages.yaml at the top of the file) and the add an extra test called hugepages_pass.yaml which contains a proc/meminfo that does not trigger the issue (i.e. raised-issues: would be an empty dict).

dosaboy commented 7 months ago

lgtm, thanks marcin!