OpenLiberty / open-liberty

Open Liberty is a highly composable, fast to start, dynamic application server runtime environment
https://openliberty.io
Eclipse Public License 2.0
1.15k stars 590 forks source link

Removing servlet feature configuration causes socket bind error #6598

Open ngmr opened 5 years ago

ngmr commented 5 years ago

When the configuration for the servlet feature --- including non-default httpEndpoint ports --- is removed from server.xml, the http channel tries to listen on the default httpEndpoint ports instead.

This happens before the servlet feature is removed (by the same update).

If either of the default ports happens to be unavailable, this results in a server error like this: com.ibm.ws.tcpchannel.internal.TCPPort E CWWKO0221E: TCP Channel defaultHttpEndpoint initialization did not succeed. The socket bind did not succeed for host * and port 8010. The port might already be in use.

The problem has been observed both in servlet-3.1 and servlet-4.0.

I have written a failing FAT to demonstrate the problem, which I have pushed to this branch.

joe-chacko commented 5 years ago

I've looked at this issue with Neil. It looks like something is picking up the change to the defaultHttpEndpoint — it switches from the configured ports back to the defaults — and trying to process it before the servlet feature is removed. This processing is unnecessary, and can be deleterious when the default ports are in use (e.g. by another server).

billw1024 commented 5 years ago

I pulled down the remote branch, successfully did ./gradlew clean, cnf:initializ, releaseNeeded. but, when I tried to run the test doing: ./gradlew com.ibm.ws.webcontainer.servlet.4.0_fat:buildandrun -Dfat.test.mode=FULL the test attempt failed with, among other things:

Could not HEAD 'https://na.artifactory.swg-devops.com/artifactory/wasliberty-open-liberty/net/ossindex/audit/net.ossindex.audit.gradle.plugin/0.3.8-beta/net.ossindex.audit.gradle.plugin-0.3.8-beta.pom'. Received status code 403 from server: Forbidden

Neil was kind enough to get traces and logs.

Here are some trace points I found helpful

12 [20/02/19 14:58:53:768 GMT] 00000001 id=00000000 com.ibm.ws.kernel.launch.internal.FrameworkManager A CWWKE0001I: The server servlet40_servletOnSpecialPort has been launched. 13 [20/02/19 14:58:55:028 GMT] 0000001e id=00000000 com.ibm.ws.config.xml.internal.XMLConfigParser A CWWKG0028A: Processing included configuration resource: /home/nrichard/work/libertyGit/open-liberty/dev/build.image/wlp/usr/servers/fatTestPorts.xml 14 [20/02/19 14:58:55:037 GMT] 0000001e id=00000000 com.ibm.ws.config.xml.internal.XMLConfigParser A CWWKG0028A: Processing included configuration resource: /home/nrichard/work/libertyGit/open-liberty/dev/build.image/wlp/usr/servers/fatTestCommon.xml 60 [20/02/19 14:58:56:792 GMT] 00000027 id=00000000 com.ibm.ws.config.xml.internal.ConfigValidator A CWWKG0102I: Found conflicting settings for defaultHttpEndpoint instance of httpEndpoint configuration. 102 [20/02/19 14:58:57:637 GMT] 0000001e id=932ed187 com.ibm.websphere.channelfw.osgi.CHFWBundle 1 Activating
104 [20/02/19 14:58:57:638 GMT] 0000001e id=932ed187 com.ibm.websphere.channelfw.osgi.CHFWBundle 1 Processing config 119 [20/02/19 14:58:57:668 GMT] 0000001e id=932ed187 com.ibm.websphere.channelfw.osgi.CHFWBundle 3 Add factory provider; com.ibm.ws.http.internal.HttpChannelProvider@5f9f058a 637 [20/02/19 14:58:58:220 GMT] 0000001e id=00000000 com.ibm.ws.app.manager.internal.monitor.DropinMonitor A CWWKZ0058I: Monitoring dropins for applications. 1233 [20/02/19 14:58:58:897 GMT] 00000027 id=932ed187 com.ibm.websphere.channelfw.osgi.CHFWBundle 3 Server Completely Started signal received 1234 [20/02/19 14:58:58:973 GMT] 00000027 id=00000000 com.ibm.ws.kernel.feature.internal.FeatureManager A CWWKF0012I: The server installed the following features: [servlet-4.0, timedexit-1.0]. 1236 [20/02/19 14:58:58:974 GMT] 00000027 id=00000000 com.ibm.ws.kernel.feature.internal.FeatureManager A CWWKF0011I: The server servlet40_servletOnSpecialPort is ready to run a smarter planet. 1257 [20/02/19 14:58:59:618 GMT] 00000029 id=00000000 com.ibm.ws.config.xml.internal.ConfigRefresher A CWWKG0016I: Starting server configuration update. 1258 [20/02/19 14:58:59:619 GMT] 00000029 id=00000000 com.ibm.ws.config.xml.internal.XMLConfigParser A CWWKG0028A: Processing included configuration resource: /home/nrichard/work/libertyGit/open-liberty/dev/build.image/wlp/usr/servers/fatTestPorts.xml 1259 [20/02/19 14:58:59:620 GMT] 00000029 id=00000000 com.ibm.ws.config.xml.internal.XMLConfigParser A CWWKG0028A: Processing included configuration resource: /home/nrichard/work/libertyGit/open-liberty/dev/build.image/wlp/usr/servers/fatTestCommon.xml 1879 [20/02/19 14:58:59:827 GMT] 0000002a id=00000000 com.ibm.ws.tcpchannel.internal.TCPPort E CWWKO0221E: TCP Channel defaultHttpEndpoint initialization did not succeed. The socket bind did not succeed for host * and port 8010. The port might already be in use. 2367 [20/02/19 14:59:00:743 GMT] 00000027 id=00000000 com.ibm.ws.config.xml.internal.ConfigRefresher A CWWKG0017I: The server configuration was successfully updated in 1.135 seconds. 2368 [20/02/19 14:59:00:755 GMT] 00000028 id=00000000 com.ibm.ws.kernel.feature.internal.FeatureManager A CWWKF0013I: The server removed the following features: [servlet-4.0]. 2369 [20/02/19 14:59:00:756 GMT] 00000028 id=00000000 com.ibm.ws.kernel.feature.internal.FeatureManager A CWWKF0008I: Feature update completed in 1.095 seconds. 2370 [20/02/19 14:59:03:317 GMT] 0000001b id=00000000 com.ibm.ws.kernel.launch.internal.ServerCommandListener A CWWKE0055I: Server shutdown requested on Wednesday, 20 February 2019 at 14:59. The server servlet40_servletOnSpecialPort is shutting down. 2371 [20/02/19 14:59:03:323 GMT] 00000036 id=00000000 com.ibm.ws.runtime.update.internal.RuntimeUpdateManagerImpl A CWWKE1100I: Waiting for up to 30 seconds for the server to quiesce. 2373 [20/02/19 14:59:03:521 GMT] 00000001 id=00000000 com.ibm.ws.kernel.launch.internal.FrameworkManager A CWWKE0036I: The server servlet40_servletOnSpecialPort stopped after 10.035 seconds.

One problem, which resolved I think will allow this to run ok, though it doesn't answer all the questions in this issue, is that the default http endpoint appears to be defined in both server.xml and in the included fatTestPorts.xml, and the one in fatTestPorts.xml is defined to use port 8010, which is not accessible during the test run. Here is the comment in the trace about that:

[20/02/19 14:58:56:792 GMT] 00000027 id=00000000 com.ibm.ws.config.xml.internal.ConfigValidator A CWWKG0102I: Found conflicting settings for defaultHttpEndpoint instance of httpEndpoint configuration. Property httpPort has conflicting values: Value ${bvt.prop.HTTP_default} is set in file:/home/nrichard/work/libertyGit/open-liberty/dev/build.image/wlp/usr/servers/fatTestPorts.xml. Value ${bvt.prop.HTTP_secondary} is set in file:/home/nrichard/work/libertyGit/open-liberty/dev/build.image/wlp/usr/servers/servlet40_servletOnSpecialPort/server.xml. Property httpPort will be set to ${bvt.prop.HTTP_secondary}. Property httpsPort has conflicting values: Value ${bvt.prop.HTTP_default.secure} is set in file:/home/nrichard/work/libertyGit/open-liberty/dev/build.image/wlp/usr/servers/fatTestPorts.xml. Value ${bvt.prop.HTTP_secondary.secure} is set in file:/home/nrichard/work/libertyGit/open-liberty/dev/build.image/wlp/usr/servers/servlet40_servletOnSpecialPort/server.xml. Property httpsPort will be set to ${bvt.prop.HTTP_secondary.secure}.

So, getting rid of the duplicate definition, which is probably in the best interest of testing anyway, would probably solve this.

Then it seems the config update triggers the channel code/ChannelFramework service to re-activate the default port, choosing the default port definition in the included fatTestPorts.xml, and failing because port 8010 is not accessible (while 8030 was). and this occurs before or during other threads dealing with adding/removing features.

I know there are a couple other questions, to which I don't have very good answers.

Why try to activate the channel upon update, before removing the feature? My guess from looking at the traces is that given the multi-threadedness of the config update logic, and the feature update logic, that the two are not synchronized to each other.

Why is a default endpoint even needed if no features are loaded that need an endpoint? I don't think there is straight forward logic in the product that couples features to requiring endpoints, and again, not sure which threads would have to figure that out, and how they would be synchronized to figure that out.

Is it ok to close this issue, or give the ownership to the people/person to solve the duplicate default endpoint issue?

billw1024 commented 5 years ago

If it is ok with everyone, tomorrow I will drop both the "release bug" and the "team:Sirius" tags.