dotCMS / core

Headless/Hybrid Content Management System for Enterprises
http://dotcms.com
Other
848 stars 466 forks source link

Flakey Test: Server may not be ready when postman makes first request #28299

Open spbolton opened 5 months ago

spbolton commented 5 months ago

Even though postman tests will not start until the DotCMS docker container is reporting "Started http Brindged Service" in the logs It seems that occasionally postman may be making a request and web filter interceptors may not be fully initialized or there is some other change that is throwing an exception, the newman process running the tasks is then dying and not running anything. This may be able to be resolved with a delay but this means that our indication here that the server is ready to accept requests may not be valid. It would be good to investigate the real cause of this issue that should feed into improved state management that is also required for k8s readiness checks, or whether this is a concurrency issue that needs resolving. If it cannot be quickly resolved adding a small startup delay for the postman script may prevent us from seeing the issue even though it is just a mask.

This may be hard to replicate due to the flakey nature so we may need to trace back to possible causes of the exception.

This has now been seen at least twice. https://github.com/dotCMS/core/actions/runs/8743129904/job/23993420830?pr=28285

16PR Test Run Postman Tests - page.txt

DOTCMS:[INFO] Started bridged http service
DOTCMS:19:01:37.485  INFO  framework.OSGIUtil - osgi felix framework started
[INFO] yarn run v1.22.19
[INFO] $ node index.js --serverUrl=http://localhost:8080 --postmanTestsDir=../dotCMS/src/curl-test page
DOTCMS:19:01:38.172  INFO  apps.SecretsKeyStoreHelper - KeyStore loaded successfully after `1` tries.
DOTCMS:19:01:38.174  INFO  apps.SecretsKeyStoreHelper - KeyStore loaded successfully after `1` tries.
DOTCMS:19:01:38.191  WARN  struts.MultiMessageResources - Duplicate resource property definition (key=was ==> is now): contenttypes.content.copy=Copy ==> Copy
DOTCMS:19:01:38.194  WARN  struts.MultiMessageResources - Duplicate resource property definition (key=was ==> is now): message.category.import=Importing Categories.  This operation cannot be cancelled or undone. ==> IMPORT
DOTCMS:19:01:38.194  WARN  struts.MultiMessageResources - Duplicate resource property definition (key=was ==> is now): message.containers.create.content_type_code=Content Type Code ==> Content Type Code
DOTCMS:19:01:38.197  WARN  struts.MultiMessageResources - Duplicate resource property definition (key=was ==> is now): Save=Save ==> Save
DOTCMS:19:01:38.199  WARN  struts.MultiMessageResources - Duplicate resource property definition (key=was ==> is now): Move=Move ==> Move
DOTCMS:19:01:38.199  WARN  struts.MultiMessageResources - Duplicate resource property definition (key=was ==> is now): dot.common.remove=Remove ==> Remove
DOTCMS:19:01:38.200  WARN  struts.MultiMessageResources - Duplicate resource property definition (key=was ==> is now): archived=archived ==> Archived
DOTCMS:19:01:38.200  WARN  struts.MultiMessageResources - Duplicate resource property definition (key=was ==> is now): experiments.action.edit=Edit ==> Edit
DOTCMS:Apr 18, 2024 7:01:38 PM org.apache.catalina.core.StandardWrapperValve invoke
DOTCMS:SEVERE: Servlet.service() for servlet [RESTAPI] in context with path [] threw exception
DOTCMS:java.lang.ArrayIndexOutOfBoundsException: Index 17 out of bounds for length 17
DOTCMS: at org.apache.oro.text.regex.Perl5Compiler.__emitNode(Unknown Source)
DOTCMS: at org.apache.oro.text.regex.Perl5Compiler.__parseAlternation(Unknown Source)
DOTCMS: at org.apache.oro.text.regex.Perl5Compiler.__parseExpression(Unknown Source)
DOTCMS: at org.apache.oro.text.regex.Perl5Compiler.compile(Unknown Source)
DOTCMS: at org.apache.oro.text.regex.Perl5Compiler.compile(Unknown Source)
DOTCMS: at com.dotmarketing.util.RegEX.getPatternCaseInsensitive(RegEX.java:66)
DOTCMS: at com.dotmarketing.util.RegEX.containsCaseInsensitive(RegEX.java:81)
DOTCMS: at com.dotcms.filters.interceptor.SimpleWebInterceptorDelegateImpl.match(SimpleWebInterceptorDelegateImpl.java:284)
DOTCMS: at com.dotcms.filters.interceptor.SimpleWebInterceptorDelegateImpl.anyMatchFilter(SimpleWebInterceptorDelegateImpl.java:261)
DOTCMS: at com.dotcms.filters.interceptor.SimpleWebInterceptorDelegateImpl.intercept(SimpleWebInterceptorDelegateImpl.java:191)
DOTCMS: at com.dotcms.filters.interceptor.AbstractWebInterceptorSupportFilter.runInterceptors(AbstractWebInterceptorSupportFilter.java:38)
DOTCMS: at com.dotcms.filters.interceptor.AbstractWebInterceptorSupportFilter.doFilter(AbstractWebInterceptorSupportFilter.java:83)
DOTCMS: at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:178)
DOTCMS: at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:153)
DOTCMS: at org.apache.catalina.filters.HttpHeaderSecurityFilter.doFilter(HttpHeaderSecurityFilter.java:129)
DOTCMS: at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:178)
DOTCMS: at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:153)
DOTCMS: at com.dotcms.filters.NormalizationFilter.doFilter(NormalizationFilter.java:89)
DOTCMS: at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:178)
DOTCMS: at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:153)
DOTCMS: at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:168)
DOTCMS: at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90)
DOTCMS: at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:481)
DOTCMS: at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:130)
DOTCMS: at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:670)
DOTCMS: at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93)
DOTCMS: at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:765)
DOTCMS: at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
DOTCMS: at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
DOTCMS: at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:390)
DOTCMS: at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63)
DOTCMS: at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:928)
DOTCMS: at org.apache.tomcat.util.net.Nio2Endpoint$SocketProcessor.doRun(Nio2Endpoint.java:1732)
DOTCMS: at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52)
DOTCMS: at org.apache.tomcat.util.net.AbstractEndpoint.processSocket(AbstractEndpoint.java:1295)
DOTCMS: at org.apache.tomcat.util.net.Nio2Endpoint.setSocketOptions(Nio2Endpoint.java:339)
DOTCMS: at org.apache.tomcat.util.net.Nio2Endpoint$Nio2Acceptor.completed(Nio2Endpoint.java:479)
DOTCMS: at org.apache.tomcat.util.net.Nio2Endpoint$Nio2Acceptor.completed(Nio2Endpoint.java:415)
DOTCMS: at java.base/sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:127)
DOTCMS: at java.base/sun.nio.ch.Invoker$2.run(Invoker.java:219)
DOTCMS: at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
DOTCMS: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
DOTCMS: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
DOTCMS: at java.base/java.lang.Thread.run(Thread.java:829)
DOTCMS:
ES:[2024-04-18T19:01:39,090][INFO ][o.o.i.i.ManagedIndexCoordinator] [70c6f813d17e] Performing move cluster state metadata.
ES:[2024-04-18T19:01:39,095][INFO ][o.o.i.i.MetadataService  ] [70c6f813d17e] ISM config index not exist, so we cancel the metadata migration job.
ES:[2024-04-18T19:01:39,097][INFO ][o.o.i.i.ManagedIndexCoordinator] [70c6f813d17e] Performing ISM template migration.
ES:[2024-04-18T19:01:39,099][INFO ][o.o.i.i.m.ISMTemplateService] [70c6f813d17e] Doing ISM template migration 1 time.
ES:[2024-04-18T19:01:39,100][INFO ][o.o.i.i.m.ISMTemplateService] [70c6f813d17e] Use 2024-04-18T18:00:39.088Z as migrating ISM template last_updated_time
ES:[2024-04-18T19:01:39,102][INFO ][o.o.i.i.m.ISMTemplateService] [70c6f813d17e] ISM templates: {}
ES:[2024-04-18T19:01:39,103][INFO ][o.o.i.i.m.ISMTemplateService] [70c6f813d17e] Policies to update: []
ES:[2024-04-18T19:01:39,108][INFO ][o.o.i.i.m.ISMTemplateService] [70c6f813d17e] Failure experienced when migrating ISM Template and update ISM policies: {}
ES:[2024-04-18T19:01:39,147][INFO ][o.o.c.s.ClusterSettings  ] [70c6f813d17e] updating [plugins.index_state_management.template_migration.control] from [0] to [-1]
ES:[2024-04-18T19:01:39,151][INFO ][o.o.i.i.m.ISMTemplateService] [70c6f813d17e] Successfully update template migration setting
DOTCMS:19:01:39.348  WARN  error.custom_002derror_002dpage_jsp -  on /html/error/custom-error-page.jsp
[INFO] Unhandled error: SyntaxError: Unexpected end of JSON input
[INFO]     at JSON.parse (<anonymous>)
[INFO]     at parseJSONFromBytes (node:internal/deps/undici/undici:4553:19)
[INFO]     at successSteps (node:internal/deps/undici/undici:4527:27)
[INFO]     at fullyReadBody (node:internal/deps/undici/undici:1307:9)
[INFO]     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
[INFO]     at async specConsumeBody (node:internal/deps/undici/undici:4536:7)
[INFO]     at async fetchJWT (/home/runner/work/core/core/dotcms-postman/index.js:43:18)
[INFO]     at async main (/home/runner/work/core/core/dotcms-postman/index.js:235:17)
[INFO] error Command failed with exit code 1.
[INFO] error Command failed with exit code 1.info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
spbolton commented 5 months ago

This seems to be a concurrency issue in our RegEX class that is causing multiple threads to concurrently access the same Perl5Compiler class compile method. This looks like it could be due to ineffective synchronization in the following. It is synchronizing on a non-final hashmap object but also it is not using correct double checked locking behavior as weill as using two different Object synchronization locks to protect the compiler instance from multiple threads. Therefore if there are two threads one calling getPattern() and another calling getPatternCaseInsensitive() then they can be calling the same compiler instance.

The doc in Perl5Compiler() mention that it is not thread safe, but there is a cost to initializing, it is recommended that a Perl5Compiler instance is created per thread e.g. a threadLocal.

If we make the compiler instance properly thread safe we may find we causing some thread contention if many request threads are trying to use the same instance. Although from this class it looks like once the patterns are loaded into the patterns object it may not need the compiler again, which is why we would only see this issue on startup.
If we use a thread local then there is a cost per thread to initialize but we would be guaranteed to not have contention.

In this case I would recommend fixing the concurrent access to the compiler instance

See note

The Perl5Compiler class is used to create compiled regular expressions conforming to the Perl5 regular expression syntax. It generates Perl5Pattern instances upon compilation to be used in conjunction with a Perl5Matcher instance. Please see the user's guide for more information about Perl5 regular expressions.
Perl5Compiler and Perl5Matcher are designed with the intent that you use a separate instance of each per thread to avoid the overhead of both synchronization and concurrent access (e.g., a match that takes a long time in one thread will block the progress of another thread with a shorter match). If you want to use a single instance of each in a concurrent program, you must appropriately protect access to the instances with critical sections. If you want to share Perl5Pattern instances between concurrently executing instances of Perl5Matcher, you must compile the patterns with READ_ONLY_MASK.
private Perl5Compiler compiler;

private Map<String, org.apache.oro.text.regex.Pattern> patternsCaseInsensitive = new HashMap<>();
private Map<String, org.apache.oro.text.regex.Pattern> patterns = new HashMap<>();

private RegEX() {
        compiler = new Perl5Compiler();
    }

    private Pattern getPattern(String regEx) throws MalformedPatternException{
        Pattern p = patterns.get(regEx);
        if(!UtilMethods.isSet(p)){
            synchronized (patterns) {
                p = compiler.compile(regEx, Perl5Compiler.READ_ONLY_MASK);
                patterns.put(regEx, p);
            }
        }
        return p;
    }

private Pattern getPatternCaseInsensitive(final String regEx) throws MalformedPatternException {

        Pattern pattern = patternsCaseInsensitive.get(regEx);
        if(!UtilMethods.isSet(pattern)){
            synchronized (patternsCaseInsensitive) {
                pattern = compiler.compile(regEx,
                        Perl5Compiler.CASE_INSENSITIVE_MASK | Perl5Compiler.READ_ONLY_MASK);
                patternsCaseInsensitive.put(regEx, pattern);
            }
        }
        return pattern;
    }
spbolton commented 5 months ago

New occurrence in https://github.com/dotCMS/core/actions/runs/8805036373/job/24167373008#step:10:2305

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.