Closed OTP-Maintainer closed 3 years ago
kenneth
said:
prce 8.40 (the latest version right now) is now incorporated into Erlang/OTP on the master branch. Will be included in OTP 20 which will be released in June 2017.
The effort of porting in a new version of pcre is not the actual coding but rather the testing to get a reasonable confidence that everything works and is compatible.
The conversion to NIFs would not improve the situation or reduce the job. If using dirty schedulers the regexp operations will always have to be run in another thread which will reduce performance.
If we should change to a NIF implementation it have to wait until next time and then we will consider using PCRE2 or RE2 instead as PCRE is not further developed.
okeuday
said:
It would be helpful to keep this issue open, since pcre 8.40 isn't being used as a NIF yet. The current PCRE integration is unlike other external library integration in the Erlang VM, due to it requiring manual modifications to the external library source code to be inserted into ERTS. Anyone else that attempts to integrate an external library into the Erlang VM would likely never attempt or want to modify ERTS directly to keep things stable and maintainable. If NIFs make PCRE too slow, then NIFs could be improved, but to avoid using NIF integration would seem to indicate they are inadequate for Erlang integration.
kenneth
said:
What I wrote in my previous comment when I marked this issue as closed was that we don't intend to change the integration of PCRE to NIFs since the current integration is quite easy to do. The pcre code is modified on well described and defined spots and only to make it "trapping" which means that it returns to the scheduler after a certain number of iterations. This is required in order to not block the whole scheduler or vm because of a long lasting call to pcre.
The same type of patches needs to be done in case of a NIF integration as well, in order to get the same performance. So as I said there is no point in changing this to a NIF solution until we in that case change to integrate pcre2 or re2 instead.
The use of "Dirty" NIFs would haave to run every 're' function in a separate thread because there is no way to know in advance which calls that will be long lasting and which that will be quick. Of course this is a possible solution but we will not consider it with pcre which is an old library in maintenance mode. That is why I close this issue.
When time comes for changing library to pcre2 or re2 or something else then of course a NIF solution will be considered. And we always try to avoid modifications in 3:rd party libraries if possible.
okeuday
said:
Thank you for clarifying the situation. I understand this needs to be approached carefully. One possibility, though it may seem odd, is to add a macro that is not used by default within the pcre2 source code mainline for yielding calls, since there is likely other use of pcre that needs to yield. That would allow pcre2 to be used without the dirty scheduler (so it should be roughly the same as the situation now) though the frequency of yielding is likely unique to its usage (i.e., it may be hard to place macros in places everyone (that wants yielding) is happy with). That would also require convincing the pcre2 developers of the need for the change, and it may be hard to ensure the macros stay in the proper places as pcre2 changes. The approach with dirty NIFs is likely more difficult (more testing, potentially more latency), though it would avoid any modifications to the pcre2 source code in the future.
Original reporter:
okeuday
Affected version:OTP-19.0
Fixed in version:OTP-20.0
Component:stdlib
Migrated from: https://bugs.erlang.org/browse/ERL-207