Open nishant-dash opened 7 months ago
Hi @nishant-dash
Is it the same when you grab a sosreport from that node and feed it to the hotsos? If so, an example sosreport would be useful to dig into this issue.
I think I have some ideas as to what is contributing to the slowness and I'm working on a solution atm. Will have something up soon.
I believe we should have a substantial improvement with some patches landed recently including the one above and also https://github.com/canonical/hotsos/commit/00452081be6340786c1b55285ccc4551d131348b and some recent changes to search. I have run some tests against small and large sosreports and am definitely seeing improvements. There is definitely a lot of room for improvement still, particularly around the speed of search execution and I now we have further changes queued up for that but these recent patch hopefully remove some slowness reduce memory consumption.
We now perform all event and scenario searches (per-plugin) in one go and also allow any other searches to be performed in this way. This will remove any duplicate searches or searching of long files more than once which should result in a net improvement.
In investigating an issue with stale ports in ovn, I found that one some arm64 nodes, ovn has a lot (anywhere between 30-800) stale port entries on a host with no associated tap devices. This requires manual cleanup using ovs-vsctl like so
ovs-vsctl del-port br-int ...
When trying to run hotsos on this affected node (with stale ports), hotsos takes a really long time and had to be killed.
On a non-affected node (that has 2 ports in ovn however), hotsos runs in a few seconds.
The hotsos command I am using is
hotsos version:
1.17.0+1276~ubuntu22.04.1