erlang / otp

Erlang/OTP
http://erlang.org
Apache License 2.0
11.38k stars 2.95k forks source link

ERL-207: PCRE is difficult to update from mainline development, should be a NIF #3518

Closed OTP-Maintainer closed 3 years ago

OTP-Maintainer commented 8 years ago

Original reporter: okeuday Affected version: OTP-19.0 Fixed in version: OTP-20.0 Component: stdlib Migrated from: https://bugs.erlang.org/browse/ERL-207


The ERTS has PCRE 8.33 compiled into it with BIFs for integration that are used in the stdlib Erlang application by the re module.  This integration path would be due to the legacy nature of the re module, it appears, since NIF integration should be more well suited for the integration of external library dependencies that must share the same address space as ERTS.

While PCRE's integration may not seem like a problem, since it works, it appears that at least one bug exists with PCRE 8.33 (from 2013-05-29) based on the pull requests that were closed at:
* https://github.com/erlang/otp/pull/1111
* https://github.com/erlang/otp/pull/1108
* https://github.com/erlang/otp/pull/1107

While the bug mentioned there, at http://vcs.pcre.org/pcre?view=revision&sortby=rev&revision=1542 , appears to be a bit obscure, only found through the LLVM fuzzer (and fixed on 2015-04-01), the potential to duplicate the effort already present in mainline PCRE development simply due to the necessity of BIF yielding in ERTS is causing a problem for maintaining PCRE usage (based on the closing of the pull requests and normal difficulty that should be expected with merging a patch after almost 2 years of separate PCRE mainline changes).

The PCRE NIF integration should have two routes for integration:
* NIF yielding with enif_schedule_nif to try and replicate the BIF yielding currently present
* NIF cpu bound dirty scheduler usage (possibly higher latency for re function calls)

Both routes seem like they would be a significant development effort and it may be best to switch to PCRE2 usage (PCRE with version >= 10), since that may reduce the total time spent on PCRE.

The availability of Windows binaries for PCRE should not be a concern as long as PCRE compilation always occurs as part of Erlang/OTP, since Erlang/OTP should already be compiling PCRE.  The current situation likely required many ad-hoc modifications based on the configuration output, but I do not know.

The information here is based on discussion with Lukas Larsson and the assumption was that the stack overflow fix to PCRE (mentioned above) was not an April fool's day joke, despite occurring on April 1st.
OTP-Maintainer commented 7 years ago

kenneth said:

prce 8.40 (the latest version right now) is now incorporated into Erlang/OTP on the master branch. Will be included in OTP 20 which will be released in June 2017. 

The effort of porting in a new version of pcre is not the actual coding but rather the testing to get a reasonable confidence that everything works and is compatible.

The conversion to NIFs would not improve the situation or reduce the job. If using dirty schedulers the regexp operations will always have to be run in another thread which will reduce performance. 

If we should change to a NIF implementation it have to wait until next time and then we will consider using PCRE2 or RE2 instead as PCRE is not further developed. 
OTP-Maintainer commented 7 years ago

okeuday said:

It would be helpful to keep this issue open, since pcre 8.40 isn't being used as a NIF yet.  The current PCRE integration is unlike other external library integration in the Erlang VM, due to it requiring manual modifications to the external library source code to be inserted into ERTS.  Anyone else that attempts to integrate an external library into the Erlang VM would likely never attempt or want to modify ERTS directly to keep things stable and maintainable.  If NIFs make PCRE too slow, then NIFs could be improved, but to avoid using NIF integration would seem to indicate they are inadequate for Erlang integration.
OTP-Maintainer commented 7 years ago

kenneth said:

What I wrote in my previous comment when I marked this issue as closed was that we don't intend to change the integration of PCRE to NIFs since the current integration is quite easy to do. The pcre code is modified on well described and defined spots and only to make it "trapping" which means that it returns to the scheduler after a certain number of iterations. This is required in order to not block the whole scheduler or vm because of a long lasting call to pcre.
The same type of patches needs to be done in case of a NIF integration as well, in order to get the same performance. So as I said there is no point in changing this to a NIF solution until we in that case change to integrate pcre2 or re2 instead.

The use of "Dirty" NIFs would haave to run every 're' function in a separate thread because there is no way to know in advance which calls that will be long lasting and which that will be quick. Of course this is a possible solution but we will not consider it with pcre which is an old library in maintenance mode. That is why I close this issue. 

When time comes for changing library to pcre2 or re2 or something else then of course a NIF solution will be considered. And we always try to avoid modifications in 3:rd party libraries if possible.
OTP-Maintainer commented 7 years ago

okeuday said:

Thank you for clarifying the situation.  I understand this needs to be approached carefully.  One possibility, though it may seem odd, is to add a macro that is not used by default within the pcre2 source code mainline for yielding calls, since there is likely other use of pcre that needs to yield.  That would allow pcre2 to be used without the dirty scheduler (so it should be roughly the same as the situation now) though the frequency of yielding is likely unique to its usage (i.e., it may be hard to place macros in places everyone (that wants yielding) is happy with).  That would also require convincing the pcre2 developers of the need for the change, and it may be hard to ensure the macros stay in the proper places as pcre2 changes.  The approach with dirty NIFs is likely more difficult (more testing, potentially more latency), though it would avoid any modifications to the pcre2 source code in the future.