gbif / name-parser

The core GBIF scientific name parser library
Apache License 2.0
17 stars 4 forks source link

Consider using a predictable regex library #92

Open mdoering opened 1 year ago

mdoering commented 1 year ago

There are problematic cases when the name parser takes very, very long to run some regular expressions. It is a well known problem with backreferences in particular and the use of NFA. E.g. see also https://bugs.openjdk.org/browse/JDK-8260688

There are alternative regex implementations using DFA, but all have less features that are currently used by the name parser. But maybe these can be replaced and we can use a deterministic engine and do aways with parsing timeouts?

Alternatives:

Related issues: