Timing for multithreaded use

FelixKrayer commented 1 month ago

While implementing parallel solvers for Goblint, it became apparent that the timing functions used in Goblint are not thread-safe. Thus, issues arise, when multiple domains running in parallel call the Timing.wrap function The execution with more than one parallel Domain often fails because of an assertion error in the exit function in the module Make of goblint_timing.ml (line 134: assert (tree.name = name)). I assume the likely cause for that is that the current stack is not thread-safe. We need a thread-safe way of Timing in Goblint.

Furthermore, it has to be decided how timing would even work with multiple parallel Domains. When two Domains run in parallel and both provide a time value for a specific timed section, how would these time contributions be merged? Is it more reasonable to add these values or to use the largest value? Adding would more accurately show the work done, but would ignore the potential speedup through parallelization. Thus a final timing value generated by adding parallel times could potentially be longer than the whole run of Goblint.

sim642 commented 1 month ago

Indeed, the big question is how it's even supposed to work. The current implementation (improved from CIL's Stats) requires well-nested enter and exit calls globally, which wrap from multiple domains does not guarantee.

Our timing measures both CPU time and wall time. Summing CPU times across domains would make sense because that's the total amount of work done by the CPU. Although currently we just look it up by Unix.times which is about the entire process. I'm not sure if there's any way to even get CPU time used per-domain.

Measurement of wall time itself is easy, but aggregation is hard, neither sum nor max makes sense. Since parallelism is only to be used in a specific place (not parsing, etc), it might simply have to be that there's wall time for the entire solving, but not its sub-timings which have been aggregated.

We also measure allocated memory using Gc.allocated_bytes, which I also don't know how it works with multiple domains.

It would be worth trying to find others (not necessarily OCaml) that try to profile multi-threaded programs and see if they've come up with anything better.

How urgent is this? Can we get by by just disabling timing for our multicore attempts? The minimum fix might be to make the current stack in the timing implementation use DLS.

FelixKrayer commented 1 month ago

For now, it is only important that errors are avoided in a temporary fix (as you mentioned through DLS or disabling of the timing). We can get timings for benchmarks in other ways. For example, we can measure the time of the whole solving process which should not be an issue. However, I believe that we should find a suitable solution before merging the parallelized solvers into the master branch. There are other Issues to be solved before a merge as well, e.g. the performance issues from Ocaml 5 #1137 since this version is a prerequisite for the Domains.

goblint / analyzer

Timing for multithreaded use #1550