Flow-IPC / ipc

[Start here!] Flow-IPC - Modern C++ toolkit for high-speed inter-process communication (IPC)
https://flow-ipc.github.io/
Apache License 2.0
280 stars 11 forks source link

ipc: unit_test + TSAN: TSAN lock count limit reached in some tests: some run OK separately; 1 never runs OK; make the latter run OK; investigate further. #89

Open ygoldfeld opened 6 months ago

ygoldfeld commented 6 months ago

Filed by @ygoldfeld pre-open-source:

The current situation is as follows:

General description: Whether run locally with my clang-17, or in the GitHub pipeline with clang-15/16/17, reliably some tests in some situations hit a certain specific point within the test, at which point console gets

ThreadSanitizer: CHECK failed: sanitizer_deadlock_detector.h:67 "((n_alllocks)) < (((sizeof(all_locks_withcontexts)/sizeof((all_locks_withcontexts)[0]))))" (0x40, 0x40) (tid=74526)

and the test hangs forever right there. To be clear this is not a normal TSAN warning about a race or anything; but rather TSAN instrumentation code hitting a problem and refusing to proceed further. By the text of the problem, indeed some sort of limit of 64 "locks with contexts" is reached, and TSAN blows up. (No further analysis done on that but read on.)

1 test, even if run absolutely by itself, always hits this problem: Jemalloc_shm_pool_collection_test.Multiprocess. Hence it is explicitly skipped in the pipeline at the moment, using the gtest command line feature that can exclude tests individually.

The other problematic tests -- meaning that failing to exclude all of them from a run, while keeping all the others => problem -- are:

  LOCK_HEAVY_TESTS='Shm_session_test.External_process_array:\
                    Shm_session_test.External_process_vector_offset_ptr:\
                    Shm_session_test.External_process_string_offset_ptr:\
                    Shm_session_test.External_process_list_offset_ptr:\
                    Shm_session_test.Multisession_external_process:\
                    Shm_session_test.Disconnected_external_process:\
                    Borrower_shm_pool_collection_test.Multiprocess:\
                    Shm_pool_collection_test.Multiprocess'

Happily, though, they run just fine in a group -- but not if run as part of all the many other tests. Therefore, to avoid hitting the limitation, I have changed the pipeline to the following:

It is not ideal, but it does give good TSAN coverage, thus reducing the priority of this ticket. The priority somewhat rises due to Jemalloc_shm_pool_collection_test.Multiprocess being unable to complete even by itself however.

As for what to do -- just ideas:

It is worth looking into, but it is not a hair-on-fire problem. We can skip one test w/r/t to TSAN and survive.