Translation failures and asserts from client calling dr_app_pc_from_cache_pc on bb prefixes and rseq mangling

derekbruening commented 3 years ago

We have a client who has an itimer which calls dr_app_pc_from_cache_pc() on the PC on each sample. This results many instances of a curiosity, and some of an assert:

<CURIOSITY : tdcontext != get_thread_private_dcontext() || (((OPTION_IS_INTERNAL_stress_recreate_pc)) ? ((((void)(((dynamo_options.checklevel >= (1)) && !(!((OPTION_IS_STRING_stress_recreate_pc)) || (((&options_lock)->num_readers > 0) || self_owns_write_lock(&options_lock)))) ? (d_r_internal_error("core/translate.c", 907, "!((OPTION_IS_STRING_stress_recreate_pc)) || READWRITE_LOCK_HELD(&options_lock)"), 0) : 0)), dynamo_options.stress_recreate_pc)) : ((((dynamo_options.checklevel >= (1)) && !((0))) ? (d_r_internal_error("non-internal option argument " "stress_

Internal Error: DynamoRIO debug check failure: core/translate.c:991 tdcontext != get_thread_private_dcontext() || INTERNAL_OPTION(stress_recreate_pc) || TEST(FRAG_SELFMOD_SANDBOXED, flags) || TEST(FRAG_WAS_DELETED, flags)

Investigating we see that the translation points are always either in bb prefixes (we're running -disable_traces, but on arm every bb has a prefix) or in rseq mangling code.

Action items:

Relax these asserts for client requests: these aren't synchronous faults.
Ideally, add full xl8 support for all parts of rseq mangling and for prefixes. It improves performance (fewer synchall retries) and helps things like these client samples.
Print out DR's base at the top of these assert message callstacks so we can symbolize them more easily.

derekbruening commented 3 years ago

For adding translation to prefixes I'm considering 3 approaches:

1) xl8 code emits raw to buffer, decodes, and adds instrs to list.

2) Change how prefixes are emitted in first place to IR instead of raw bytes. This would result in simpler and cleaner prefix code. But it is a bigger change and carries risk of destabilizing sthg since it affects what a recreated ilist looks like; it is definitely more work; it has a perf impact on emit by adding a bunch of encoding, which esp w/ no tool will be a noticeable bb build time increase over a bb which only needs to encode its final jump.

3) Have 2 versions of prefix emit: today's raw, and parallel in IR. At init time in debug build check both are the same. xl8 calls the IR version. This is a step toward 2): maybe later if we have time we then remove the raw for a cleaner approach if we determine perf and deps are ok.

There's also exit stubs which also seem worth adding full support for, for a64 where we use stubs in hot code (unlike x86) (I filed #4675). For those a decode approach makes the most sense.

After discussion: the decision is to go with 1).

derekbruening commented 2 years ago

PR #4854 for #4316 tried to add translation support for rseq mangling but it seemed to have missed something we're now hitting in #4923.

DynamoRIO / dynamorio

Translation failures and asserts from client calling dr_app_pc_from_cache_pc on bb prefixes and rseq mangling #4669