Why the blkio_delay is so different with iotop-py?

zylthinking commented 8 months ago

I made a busy reading of disk in clickhouse, and notice the GRAPH[IO]▽ column in the view is quite different with the IO> column in iotop.

The main thread of clickhouse shows the GRAPH[IO] is 100% and IO> is zero. Which I think iotop is correct.

Because the IO mainly happens in a background thread, the main thread should be idle.

I see the data is fetched from netlink instead of /proc; but the algorithm is similar. I don't understand the result, is there something I missed that GRAPH[IO] is not equivalent to IO> ?

zylthinking commented 8 months ago

I found the logic, you added all threads statistics to main threads besides to thread itself.

Seems the main thread is used for filter and ordering, thus it must sum all threads data; while corrupts the outputs of main thread

bbonev commented 8 months ago

Right, maybe the current behavior is erratic. I need to double check it anyways...

The intent was to have thread data summed in process mode where only the main thread is displayed, while in thread mode each thread to show its data without summing.

In order to switch between both modes, iotop will need to keep both the main thread and the summed data. That will either make the thread list more complex (different structures) or use more RAM (doubling the data).

What do you think?

zylthinking commented 8 months ago

Yes, it should maintain the process level information independently.

And I found another possible problem, pidgen_cb may enumerate other threads before main thread, that results in losing of some thread's data at process level.

And, I think IO should not be summed up at process level, maybe max(IO) or average(IO) of all threads is better.

bbonev commented 8 months ago

Thanks for spotting these problems!

I will separate the information. But if there is average/max then the memory usage will definitely increase. My thinking is that memory nowadays is cheap and saving some cycles should be preferred. Calculating the possible aggregates should definitely happen during the data collection and not during the display (unless the possible views count increases way too much). I prefer to be able to switch between different 'views' without starting over with the history data.

After checking the code, I can confirm that it is possible pidgen_cb to see a thread before the main thread and then it will not sum its data into the main thread. In thread mode, the display will be correct besides that the main thread has summed data instead of its own only.

The aggregation function currently in use should have been average. Since IO is measured as time, summing the threads can go over 100% and that would later be clipped to 100%...

I think that adding a toggle to switch between max and average is a good idea. Of course that applies to process mode only.

To summarize:

Bug 1) Do not aggregate the data in the main thread [thread mode only]; keep both aggregated main thread data and its original data to enable proper switching between thread and process mode. Bug 2) Change pidgen_cb to properly handle any order of the threads. Bug 3) Actually do the aggregation instead of summing the data. Feat 1) Add a toggle to use max/average as aggregate function.

Tomas-M / iotop

Why the blkio_delay is so different with iotop-py? #65