OpenLiberty / liberty-arquillian

Arquillian Liberty Managed and Remote containers
Apache License 2.0
11 stars 29 forks source link

Add support for tests with intentional app deployment failures #137

Open brideck opened 2 months ago

brideck commented 2 months ago

Some TCKs contain tests that are intentionally causing and testing deployment failures. These have no hope of passing currently because any deployed app that fails to start causes an exception to be thrown from the plugin:

[ERROR] com.sun.ts.tests.websocket.negdep.invalidpathparamtype.srv.onclose.WSCClientIT -- Time elapsed: 193.6 s <<< ERROR!
org.jboss.arquillian.container.spi.client.container.DeploymentException: Timeout while waiting for "wsc_negdep_invalidpathparamtype_srv_onclose_web" ApplicationMBean to reach STARTED. Actual state: INSTALLED.
    at io.openliberty.arquillian.managed.WLPManagedContainer.waitForApplicationTargetState(WLPManagedContainer.java:1280)
    at io.openliberty.arquillian.managed.WLPManagedContainer.deploy(WLPManagedContainer.java:539)
    at org.jboss.arquillian.container.impl.client.container.ContainerDeployController$3.call(ContainerDeployController.java:151)
    at org.jboss.arquillian.container.impl.client.container.ContainerDeployController$3.call(ContainerDeployController.java:118)
    at org.jboss.arquillian.container.impl.client.container.ContainerDeployController.executeOperation(ContainerDeployController.java:239)
    at org.jboss.arquillian.container.impl.client.container.ContainerDeployController.deploy(ContainerDeployController.java:118)
        ...

I'm not sure if there's a better way to check for this particular scenario, but we could add a configuration property (e.g. allowAppDeployFailures) that makes it so the expected/desired state of deployed applications to just be INSTALLED instead of STARTED.

Azquelt commented 2 months ago

There are already many TCK tests which intentionally cause deployment failures in the CDI TCK which have no issues, so we need to see what's different with these tests.

Azquelt commented 2 months ago

Assuming we're talking about this test: https://github.com/jakartaee/platform-tck/blob/c4d7da5b46aae3acaf8f9f2eed1ac2b07bb25468/websocket/spec-tests/src/main/java/com/sun/ts/tests/websocket/negdep/malformedpath/WSCClientIT.java#L46-L47

Comparing to LookupAmbiguousTest from the CDI TCK, the CDI test has @ShouldThrowException(DeploymentException.class) to indicate that deployment of the application is expected to fail.

The platform TCK test does not have this annotation so I'm not sure how the arquillian container is supposed to know that that deployment is not expected to succeed.

brideck commented 2 months ago

That test among others, yeah. Thanks for sharing the mechanism for this. I can update all of the failing tests locally and see if that gets OL a cleaner looking result. If so, this becomes an obvious TCK challenge.

Curious how Tomcat might be passing this TCK in its current state, unless they either a) aren't failing the deployments or b) are compensating for it in their own Arquillian plugin.

Azquelt commented 2 months ago

Note that DeploymentException in this case is a CDI spec exception. If the websocket spec doesn't specify a specific exception, I would use @ShouldThrowException(Exception.class).

The arqillian support feature looks for and reports the exceptions thrown during app startup so that we can check the correct exception was thrown so you should have it installed and enabled if you're running tests expecting deployments to fail. (The arquillian container will fall back to searching FFDC logs if the support feature is not installed, but you don't always get FFDCs out if several tests cause similiar exceptions to be thrown).

brideck commented 2 months ago

Noted. There is likewise a jakarta.websocket.DeploymentException that is expected here per the comments in the various tests, so I'll give that a go first. I'll make sure that's an appropriate specific expectation of the spec before pursuing any changes in the TCK.

volosied commented 2 months ago

Hey @Azquelt,

Brian tried the @ShouldThrowException, but said it didn't work. I looked at the implementation and found this code:

https://github.com/OpenLiberty/liberty-arquillian/blob/3aec0590869a690d5150b236826df687bb5e14a9/liberty-managed/src/main/java/io/openliberty/arquillian/managed/WLPManagedContainer.java#L1484

It looks for the CWWKZ0002 error message and then throws a DeploymentException to Arquillian. However, in our websocket scenario, we don't log CWWKZ0002.

We only log:

[ERROR   ] CWWKH0022E: Exception occurred during WebSocket application deployment because @PathParam parameter arg defined in method onClose has invalid type in Annotated endpoint class 

This looks like it might cause issues with how @ShouldThrowException is expected to work?

Azquelt commented 2 months ago

@volosied CWWKZ0002 is logged when the application fails to start, does the application not fail to start in this instance?

Why is CWWKZ0002 not logged?

Does CWWKZ0001I get logged? (Application started successfully)

volosied commented 2 months ago

The app does not start, and these are the only messages from the console.log. I don't know why CWWKZ0002 is not logged.

Launching defaultServer (Open Liberty 24.0.0.10/wlp-1.0.94.202409101606) on OpenJDK 64-Bit Server VM, version 17.0.7+7 (en_US)
[AUDIT   ] CWWKE0001I: The server defaultServer has been launched.
[AUDIT   ] CWWKZ0058I: Monitoring dropins for applications.
[AUDIT   ] CWWKT0016I: Web application available (default_host): http://localhost:9080/wsc_negdep_invalidpathparamtype_srv_onclose_web/
[ERROR   ] CWWKH0022E: Exception occurred during WebSocket application deployment because @PathParam parameter arg defined in method onClose has invalid type in Annotated endpoint class com.sun.ts.tests.websocket.negdep.invalidpathparamtype.srv.onclose.OnCloseStringHolderServerEndpoint.
[AUDIT   ] CWWKZ0012I: The application wsc_negdep_invalidpathparamtype_srv_onclose_web was not started.
[AUDIT   ] CWWKT0017I: Web application removed (default_host): http://localhost:9080/wsc_negdep_invalidpathparamtype_srv_onclose_web/
[AUDIT   ] CWWKF0012I: The server installed the following features: [servlet-6.1, websocket-2.2].
[AUDIT   ] CWWKF0011I: The defaultServer server is ready to run a smarter planet. The defaultServer server started in 1.842 seconds.
volosied commented 2 months ago

I look a look at one CDI application (LegacyNonEmpty that reports the CWWKZ0002E error), and I found one difference.

CDI has an implementation for applicationStarting: https://github.com/OpenLiberty/open-liberty/blob/662ce6274ec2cee5476283980fe61d7bd8872f25/dev/com.ibm.ws.cdi.weld/src/com/ibm/ws/cdi/liberty/CDIRuntimeImpl.java#L453

WebContainer does not: https://github.com/OpenLiberty/open-liberty/blob/662ce6274ec2cee5476283980fe61d7bd8872f25/dev/com.ibm.ws.webcontainer/src/com/ibm/ws/webcontainer/osgi/WebContainerListener.java#L165

However, I don't know all the various paths the CWWKZ0002 can be reported.

UPDATE:

I was looking at the webcontainer initialization process and found this code:

https://github.com/OpenLiberty/open-liberty/blob/662ce6274ec2cee5476283980fe61d7bd8872f25/dev/com.ibm.ws.webcontainer/src/com/ibm/ws/webcontainer/osgi/WebContainer.java#L987-L1011

If I set <webContainer deferServletLoad="false"/> then the StateChangeException("startWebApplication"); is thrown and the CWWKZ0002E message occurs:

[ERROR   ] CWWKZ0002E: An exception occurred while starting the application wsc_negdep_invalidpathparamtype_srv_onclose_web. The exception message was: com.ibm.ws.container.service.state.StateChangeException: com.ibm.ws.container.service.state.StateChangeException: startWebApplication

It's not the original websocket deployment error, but it looks like this is where we need to look further. It seems like we need to throw a StateChangeException to get the CWWKZ0002 message?

However, I'm worried about updating any code here. It could cause a regression. I'm also not familiar with the deployment code either.

The default code is executed from the Runnable (run doesn't throw any exceptions), so I don't think we can just simply throw an exception here. Likely why it was coded this way.

Azquelt commented 2 months ago

It's not the original websocket deployment error, but it looks like this is where we need to look further. It seems like we need to throw a StateChangeException to get the CWWKZ0002 message?

Possibly? I'm not sure. For CDI it looks like we do throw a StateChangeException with the actual exception as the cause.

Arquillian will check the exception thrown and all of its causes for the expected exception when a deployment fails.

However, I'm worried about updating any code here. It could cause a regression. I'm also not familiar with the deployment code either.

I don't think we should be changing any liberty code here, but we need to understand how the startup failure is being reported.

At the moment, the liberty-arquillian container assumes that an application will either start successfully (and log CWWKZ0001) or fail to start (and log CWWKZ0002).

It then has various ways of getting the exception that caused the failure (see the classes in the exceptions package), but if you have the arquillian-support-feature, it will call that which uses IncidentListener which listens for a StateChangeException being recorded by the FFDC mechanism.

It sounds like in your case, both of those things aren't true?

It looks like the message that reports that the app fails to start is

[AUDIT   ] CWWKZ0012I: The application wsc_negdep_invalidpathparamtype_srv_onclose_web was not started.

so we probably need to look for that as well as CWWKZ0002.

I'm not familiar with the deployment code either, so I'm not sure when you would get CWWKZ0002 and when you would get CWWKZ0012.

We then also need a way of getting hold of the actual exception that occurred (ideally we would enable arquillian support feature to report the exact exception thrown, though we could also make up an exception based on the logs as the fallback logic in CDILogExceptionLocator does). I would suggest you discuss with the liberty kernel team on what they think is the best way to do this.

volosied commented 1 month ago

@Azquelt I was able to get the following message logged ( with the help of <webContainer deferServletLoad="false"/>) :

[ERROR ] CWWKZ0002E: An exception occurred while starting the application wsc_negdep_invalidpathparamtype_srv_onclose_web (2). The exception message was: com.ibm.ws.container.service.state.StateChangeException: jakarta.websocket.DeploymentException: CWWKH0022E: Exception occurred during WebSocket application deployment because @PathParam parameter arg defined in method onClose has invalid type in Annotated endpoint class com.sun.ts.tests.websocket.negdep.invalidpathparamtype.srv.onclose.OnCloseStringHolderServerEndpoint.

From here, arquillian should be able to pick up the original exception -- jakarta.websocket.DeploymentException when we use@ShouldThrowException(DeploymentException.class)?

Azquelt commented 1 month ago

Yes, if you get that message and an FFDC for StateChangeException which has your DeploymentException somewhere in the cause chain, it should work with @ShouldThrowException.

Ideally though, the liberty-arquillian container would pick up the exception however it's thrown.

jhanders34 commented 1 month ago

I thought we did work to make <webContainer deferServletLoad="false"/> be the default, but looking at the metatype I am wrong. I think it is worth having Brian run with that setting in his server.xml files.

volosied commented 1 month ago

Hey @jhanders34

@brideck was able to make some progress with this issue through deferServletLoad=false. (I also updated the webcontainer code slightly to have the StateChangeException log the jakarta.websocket.DeploymentException as seen above).

However, some more issues were discovered in the websocket TCK. It's likely going to be challenge, and we'll see where the conversation goes from there.