canonical / hotsos

Software analysis toolkit. Define checks in high-level language and leverage library to perform analysis of common Cloud applications.
Apache License 2.0
30 stars 37 forks source link

hotsos on arm64 nodes is taking too long in some cases #850

Open nishant-dash opened 2 months ago

nishant-dash commented 2 months ago

In investigating an issue with stale ports in ovn, I found that one some arm64 nodes, ovn has a lot (anywhere between 30-800) stale port entries on a host with no associated tap devices. This requires manual cleanup using ovs-vsctl like so ovs-vsctl del-port br-int ...

When trying to run hotsos on this affected node (with stale ports), hotsos takes a really long time and had to be killed.

On a non-affected node (that has 2 ports in ovn however), hotsos runs in a few seconds.

The hotsos command I am using is

hotsos --openvswitch --kernel --openstack --system

hotsos version: 1.17.0+1276~ubuntu22.04.1

mustafakemalgilor commented 2 months ago

Hi @nishant-dash

Is it the same when you grab a sosreport from that node and feed it to the hotsos? If so, an example sosreport would be useful to dig into this issue.

dosaboy commented 2 months ago

I think I have some ideas as to what is contributing to the slowness and I'm working on a solution atm. Will have something up soon.

dosaboy commented 2 months ago

I believe we should have a substantial improvement with some patches landed recently including the one above and also https://github.com/canonical/hotsos/commit/00452081be6340786c1b55285ccc4551d131348b and some recent changes to search. I have run some tests against small and large sosreports and am definitely seeing improvements. There is definitely a lot of room for improvement still, particularly around the speed of search execution and I now we have further changes queued up for that but these recent patch hopefully remove some slowness reduce memory consumption.

dosaboy commented 1 month ago

We now perform all event and scenario searches (per-plugin) in one go and also allow any other searches to be performed in this way. This will remove any duplicate searches or searching of long files more than once which should result in a net improvement.