Profile TW performance with large task DBs

djmitche commented 2 months ago

...Probably running a profiler is the action item from this issue...

Originally posted by @djmitche in https://github.com/GothenburgBitFactory/taskwarrior/issues/3329#issuecomment-2335007219

I think this would be best done from Taskwarrior, treating the Rust code as a "black box". It would be great to learn which calls into Rust are taking the longest -- and what non-Rust things are taking a long time, too. I can think of a few possibilities we might learn, but the truth is probably something different:

Opening the DB might take a long time (partly due to #3418)
Writes may be super-slow (https://github.com/GothenburgBitFactory/taskchampion/issues/373 should help with this, by doing writes in fewer transactions)
We may be calling into Rust repeatedly in a way that is wasteful (e.g., TDB2::pending_tasks)
The translation between Rust and C++ data structures may be slow (C++'s std::string and Rust's String are not compatible)

I don't know much about profiling in C++, but perhaps you, dear reader, do?

felixschurk commented 1 month ago

I think in order to have it somehow comparable we should use here the existing load, run_perf tasks which are in the performance folder. But probably (at least for a first run) in a bit cut down size.. Currently it would create 8510 tasks, which is fairly large :D

I am planning to take a look into this, however I will probably not have time before end of September :/

felixschurk commented 1 month ago

I did let it run on my local machine which runs on Fedora, having an i5-1145G7 processor and 16 GiB of RAM. Not using the machine for anything else super intense at the profiling time and then it came up with some call graph like this:

At least based on that what takes the most of the time is the rendering, starting, and the rebuilding also the working set of taskwarrior.

I am not really sure how the best way to share results of performance profiling etc.

I also did let run the same performance suite (the one which is with taskwarrior) on the release of 2.6.2 and there the total runtime was only 2.7s vs. 4.25s on the current develop branch. I mean the measurements are not about the current absolute time nor anything (I only did let them run once so far, so in order to have there something super consistent multiple runs and then some average or so would be needed) but still one sees that taskwarrior 3 is noticeably slower, but solely from the profiling I can not really pin it down to one method where the problem might be coming from.

The performance script also returns some counters for garbage collection etc. and there the counters are an order more for the taskwarrior 3, but I am not aware on how they are computed so might also report something wrong.

felixschurk commented 1 month ago

Okay, I just noticed that one "counter" or timer was lost due to the transition to the TDB2 backend. More specifically, the timer which tracked how long it takes to load some task set into Taskwarrior.

The counter was in Line 419 of TDB2.cpp. https://github.com/GothenburgBitFactory/taskwarrior/commit/5bb98579841d41e680280ada7bf3a3e9a9940553#diff-d6d65d1f59ba33ef5268d5bc8a49ed26f397b899b902332404aee34e800cbc0bL419

Context::getContext ().time_load_us += timer.total_us ();

This explains why the reporting of the performance always states load:0 for the latest taskwarrior.

djmitche commented 1 month ago

That flame graph seems to have multiple commands in it -- is it a concatenation from multiple runs? It's a little difficult to interpret the graphic, since things are cut off. Is there a way to share the source file from which that was drawn?

https://github.com/GothenburgBitFactory/taskwarrior/pull/3635 should restore the load timer.

felixschurk commented 4 days ago

I will come back and see how to extract it better to a simple output.

GothenburgBitFactory / taskwarrior

Profile TW performance with large task DBs #3618