How to debug a potential jit issue

jmbnyc commented 1 month ago

My team and I are still working to confirm but we are seeing a crash inside pcre2_jit_match. Unfortunately gdb is not very helpful because we get a huge set of stacks with ??. We might be able to do better if we pull the pcre code into our main code base instead of loading it as a library. However, we are wondering how we can debug? Do you have any suggestions on how we can narrow down the issue we might be encountering. We are using version 42.

carenas commented 1 month ago

sljit stacktraces might normally contain those, if the crash is in the generated code.

if you have a core dump with the crash then a backtrace and a disassemble of the crash x/16i $pc-32, together with the expression that crashed it (specially if it is reproducible) will help.

carenas commented 1 month ago

We are using version 42.

Assume you mean 10.42. If your application is threaded then probably should upgrade to 10.44. Also see #435

zherczeg commented 1 month ago

gdb usually does not support backtraces for jit code. If the issue can be reproduced easily, than it is better to put a breakpoint before the jit code is executed: https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2_jit_match.c#L91

You can use ignore or condition gdb commands to stop the right time, then get a backtrace. If you don't know how many times the breakpoint needs to be ignored, you can set a huge number, such as 1000000, and use info breakpoints to get the number of ignores before the issue. That number-1 is a good number for the next ignore.

jmbnyc commented 1 month ago

does this mean anything to anyone that has been kind enough to respond?

(gdb) x/16i $pc-32 0x2e5c8c5 <pcre2_jit_match_8+496>: and $0x48,%al 0x2e5c8c7 <pcre2_jit_match_8+498>: mov -0x8(%rbp),%eax 0x2e5c8ca <pcre2_jit_match_8+501>: mov 0x18(%rax),%rax 0x2e5c8ce <pcre2_jit_match_8+505>: mov %rax,-0xa0(%rbp) 0x2e5c8d5 <pcre2_jit_match_8+512>: mov -0x38(%rbp),%rdx 0x2e5c8d9 <pcre2_jit_match_8+516>: lea -0xa0(%rbp),%rax 0x2e5c8e0 <pcre2_jit_match_8+523>: mov %rax,%rdi 0x2e5c8e3 <pcre2_jit_match_8+526>: callq *%rdx => 0x2e5c8e5 <pcre2_jit_match_8+528>: mov %eax,-0x10(%rbp) 0x2e5c8e8 <pcre2_jit_match_8+531>: jmp 0x2e5c903 <pcre2_jit_match_8+558> 0x2e5c8ea <pcre2_jit_match_8+533>: mov -0x38(%rbp),%rdx 0x2e5c8ee <pcre2_jit_match_8+537>: lea -0xa0(%rbp),%rax 0x2e5c8f5 <pcre2_jit_match_8+544>: mov %rdx,%rsi 0x2e5c8f8 <pcre2_jit_match_8+547>: mov %rax,%rdi 0x2e5c8fb <pcre2_jit_match_8+550>: callq 0x2e5c652 0x2e5c900 <pcre2_jit_match_8+555>: mov %eax,-0x10(%rbp)

zherczeg commented 1 month ago

The crash is not in a jit code, it is in pcre2_jit_match_8. The callq *%rdx is an indirect call, the target is loaded by mov -0x38(%rbp),%rdx. You should check if rbp contains a valid stack location. Maybe the call does not restore it properly. It would be good to know what is called there.

Probably this is the location: https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2_jit_match.c#L171

zherczeg commented 1 month ago

I need to correct myself. If you use pcre2_match and pcre2_jit_stack_assign then you need a separate match context. If you use pcre2_jit_match() you don't need it.

jmbnyc commented 1 month ago

zherczeg, Thanks for your response. I determined the same thing and determined that the cause was concurrent calls to pcre2_jit_stack_assign with the same match context and different jit stack memory. As I mentioned in another post, once I read the code, it was obvious that match context must be thread local (in my code). Net/Net, my thread local for matching now contains match data, match context, and jit stack. Each regex pattern is matched using a thread local object where the jit stack assign can be done during thread local init.

I appreciate the help here as it allowed me to debug and figure this out. As I mentioned, the docs did not make it completely clear that match data, match context and jit stack all need to be thread local to allow concurrent matching against a pattern (represented by a pcre2_code object. I probably should have read the code first because it becomes very clear what is required to get thread safe concurrent matching.

PCRE2Project / pcre2

How to debug a potential jit issue #517