Open spbolton opened 5 months ago
This seems to be a concurrency issue in our RegEX class that is causing multiple threads to concurrently access the same Perl5Compiler class compile method. This looks like it could be due to ineffective synchronization in the following. It is synchronizing on a non-final hashmap object but also it is not using correct double checked locking behavior as weill as using two different Object synchronization locks to protect the compiler instance from multiple threads. Therefore if there are two threads one calling getPattern() and another calling getPatternCaseInsensitive() then they can be calling the same compiler instance.
The doc in Perl5Compiler() mention that it is not thread safe, but there is a cost to initializing, it is recommended that a Perl5Compiler instance is created per thread e.g. a threadLocal.
If we make the compiler instance properly thread safe we may find we causing some thread contention if many request threads are trying to use the same instance. Although from this class it looks like once the patterns are loaded into the patterns object it may not need the compiler again, which is why we would only see this issue on startup.
If we use a thread local then there is a cost per thread to initialize but we would be guaranteed to not have contention.
In this case I would recommend fixing the concurrent access to the compiler instance
See note
The Perl5Compiler class is used to create compiled regular expressions conforming to the Perl5 regular expression syntax. It generates Perl5Pattern instances upon compilation to be used in conjunction with a Perl5Matcher instance. Please see the user's guide for more information about Perl5 regular expressions.
Perl5Compiler and Perl5Matcher are designed with the intent that you use a separate instance of each per thread to avoid the overhead of both synchronization and concurrent access (e.g., a match that takes a long time in one thread will block the progress of another thread with a shorter match). If you want to use a single instance of each in a concurrent program, you must appropriately protect access to the instances with critical sections. If you want to share Perl5Pattern instances between concurrently executing instances of Perl5Matcher, you must compile the patterns with READ_ONLY_MASK.
private Perl5Compiler compiler;
private Map<String, org.apache.oro.text.regex.Pattern> patternsCaseInsensitive = new HashMap<>();
private Map<String, org.apache.oro.text.regex.Pattern> patterns = new HashMap<>();
private RegEX() {
compiler = new Perl5Compiler();
}
private Pattern getPattern(String regEx) throws MalformedPatternException{
Pattern p = patterns.get(regEx);
if(!UtilMethods.isSet(p)){
synchronized (patterns) {
p = compiler.compile(regEx, Perl5Compiler.READ_ONLY_MASK);
patterns.put(regEx, p);
}
}
return p;
}
private Pattern getPatternCaseInsensitive(final String regEx) throws MalformedPatternException {
Pattern pattern = patternsCaseInsensitive.get(regEx);
if(!UtilMethods.isSet(pattern)){
synchronized (patternsCaseInsensitive) {
pattern = compiler.compile(regEx,
Perl5Compiler.CASE_INSENSITIVE_MASK | Perl5Compiler.READ_ONLY_MASK);
patternsCaseInsensitive.put(regEx, pattern);
}
}
return pattern;
}
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.
Even though postman tests will not start until the DotCMS docker container is reporting "Started http Brindged Service" in the logs It seems that occasionally postman may be making a request and web filter interceptors may not be fully initialized or there is some other change that is throwing an exception, the newman process running the tasks is then dying and not running anything. This may be able to be resolved with a delay but this means that our indication here that the server is ready to accept requests may not be valid. It would be good to investigate the real cause of this issue that should feed into improved state management that is also required for k8s readiness checks, or whether this is a concurrency issue that needs resolving. If it cannot be quickly resolved adding a small startup delay for the postman script may prevent us from seeing the issue even though it is just a mask.
This may be hard to replicate due to the flakey nature so we may need to trace back to possible causes of the exception.
This has now been seen at least twice. https://github.com/dotCMS/core/actions/runs/8743129904/job/23993420830?pr=28285
16PR Test Run Postman Tests - page.txt