Closed randyzwitch closed 10 years ago
See pcrestack on how to increase the PCRE stack size (or how to rearrange your regex to require less stack). It seems like it has to be done at compile time, and you may also need to increase the OS stack size.
The default stack size is only 32KB. Maybe we should allocate one, say 1MB, stack and set all the regexes to use that when they're compiled.
This from the pcrejit manpage made me laugh:
(7) This is too much of a headache. Isn't there any better solution for JIT stack handling?
No, thanks to Windows. If POSIX threads were used everywhere, we could throw out this complicated API.
It does seem reasonable to have a higher stack size, at least on linux and mac, if windows is a problem.
Thanks for confirming that the issue is a small stack default @dcjones.
Is there a simple setting I can modify while compiling from source to play around with different stack size values?
Not super simple, but if pat
is your regex pattern, you can do this and it should work.
ccall((:pcre_assign_jit_stack, :libpcre),
Void, (Ptr{Void}, Ptr{Void}, Ptr{Void}), pat.extra, C_NULL,
ccall((:pcre_jit_stack_alloc, :libpcre),
Ptr{Void}, (Cint, Cint), 32768, 1048576))
In that example 32768
is the initial stack size and 1048576
is the maximum.
Thanks @dcjones! I tried this out on the bug example above and it worked, and tested it on a 350,000 array of Apache Log strings and didn't get any errors (which previously failed based on the example string).
Is this something that could be incorporated into Base easily or should I just build this fix into my package (or both)?
Yes I think we should use a bigger stack by default; 32k is extremely small. It seems like the only way to do this is for us to explicitly call pcre_assign_jit_stack
for every regex? Or at least intercept the error, print a nice message and provide an easier way to do this.
I was going to make a PR to set patterns to all use a 1mb stack, but am running into an issue. If I define globals in pcre.jl like so
const JIT_STACK_START_SIZE = 32768
const JIT_STACK_MAX_SIZE = 1048576
const JIT_STACK = ccall((:pcre_jit_stack_alloc, :libpcre), Ptr{Void},
(Cint, Cint), JIT_STACK_START_SIZE, JIT_STACK_MAX_SIZE)
JIT_STACK
is always NULL. Yet it works from the repl. Why would that be?
@dcjones Maybe the ccall has to happen in __init__
since the pointer can't be saved in sys.so? Does it work if you remove sys.so/dylib/dll?
Thanks @simonster, that was the issue.
Isn't there a way to set the stack size when PCRE is compiled?
That wouldn't help if your build used USE_SYSTEM_PCRE.
Feels like a person building themselves and changing to use their own system PCRE would presumably know to change the stack size or have done it themselves? So if doing this at compile time takes an extra call out of every regex match function, that seems like a decent trader off to me.
Maybe just out a note in the make file to make sure stack size is large enough if you choose to use system PCRE?
That's "trade off" and "put a note", iOS is not being good to me this morning
@randyzwitch People the least involved in Julia development are going to use distribution packages on Linux, and they'll use the system PCRE without even knowing it.
Since it's simple for us to set the stack size at run time, I can't see why we wouldn't.
(Edit: Working off nightly 0.4 build) I'm making a package to parse Apache logs. See code here: https://github.com/randyzwitch/LogParser.jl
I'm fairly comfortable with the regex I wrote, having a 99% match rate on my test files. However, on one particularly gnarly string, I cause the following error:
Here's the man explanation page for -27: PCRE_ERROR_JIT_STACKLIMIT (-27)
http://www.pcre.org/pcre.txt
This much of the regex works fine:
Any ideas what to do here or what the problem might be? Seems like a try/catch is the wrong way to handle this, it seems like a lower-level type of issue.