Open johnfxgalea opened 5 years ago
Thank you for the detailed writeup. To turn on -private_ib_in_tls for -thread_private we would want to double-check that there's no performance difference, and consciously decide to abandon !HAVE_TLS platforms (probably fine).
Problem: Undefined behaviour is encountered when a client places a clean call inside a private shared code cache and has DR options set to use private caches (via -thread_private) for the main cache. At a high level, DynamoRIO gets confused, as it treats the insertion of the clean call as an insertion to a thread-private cache. However, the private cache is NOT thread-private but is shared.
The problem mainly occurs on 32-bit platforms.
Test Case: The following code is a modified version of the memtrace sample. I could have used the original code of memtrace to show the bug, but this version removed unnecessary overhead for quick testing.
The code is fairly simple. For every instruction, the client jumps inside a shared private code cache created via _dr_nonheapalloc. The cache has a clean call inserted inside it.
The application under test needs to be multi-threaded. The bug fails on applications such as Apache. However, I recommend pigz for testings as it is quite a light-weight application and the bug is triggered in a matter of seconds.
I run the the client as follows:
/home/john/dynamorio/install/bin32/drrun -thread_private -disable_traces -opt_cleancall 2 -c libmemtrace_test.so -- pigz -k -d /home/john/generated.zip
Back-Trace: The back-trace does not provide much hints towards the root cause of the bug. It indicates that the bug is triggered in _monitor_cacheenter, but the function is unrelated. I had cases where the top function changes on multiple runs.
I also analysed the debug logs but did not find anything suspicious at the crash point - the master signal is just triggered to handle the fault.
Root Cause: Eventually, I made some progress by inspecting code related to inserting clean calls, particularly the _prepare_for_callex and _cleanup_after_callex. These functions set the runtime usage of drcontext statically or dynamically depending whether caches are shared and whether the architecture is 64-bit. This check is done via _SCRATCH_ALWAYSTLS
# define SCRATCH_ALWAYS_TLS() (DYNAMO_OPTION(private_ib_in_tls))
The code essentially determines whether the _private_ib_intls parameter is set. Note, this parameter is enabled by default for shared caches and for 64-bit, but not when dealing with a 32-bit application and thread private caches are used.
Regarding the bug, when the clean call is inserted inside the private cache, an absolute address pertaining to the thread's drcontext is used. However, the private code cache is shared, and when another thread enters the cache, it's not dynamically loading its drcontext, but modifying the drcontext of some other thread.
The place where the absolute address is set is in the _dcontext_opndcommon function, where the the absolute parameter is passed as true.
Solutions: One quick solution is to document this limitation and tell the user to pass the _private_ib_intls runtime parameter when clean calls are in shared private caches - the client does not crash with this parameter. The second, perhaps more convoluted solution with regards to code changes, is to always use tls slots and dynamically load the drcontext, thus removing the _private_ib_intls parameter all together.