LinuxForHealth / FHIR

The LinuxForHealth FHIR® Server and related projects
https://linuxforhealth.github.io/FHIR
Apache License 2.0
327 stars 157 forks source link

docker image should shut down faster when there's no active connections #3566

Open lmsurpre opened 2 years ago

lmsurpre commented 2 years ago

Is your feature request related to a problem? Please describe. When you sent SIGINT to liberty (e.g. by asking the container to stop) it has a quiesce period that is 30s by default. But if there's nothing happening, it should shut down sooner.

Unfortunately, in our container, this never seems to happen.

[4/6/22, 12:15:24:414 UTC] 0000005a RuntimeUpdate A   CWWKE1100I: Waiting for up to 30 seconds for the server to quiesce.
[4/6/22, 12:15:24:423 UTC] 00000034 TCPChannel    I   CWWKO0220I: TCP Channel defaultHttpEndpoint-ssl has stopped listening for requests on host *  (IPv4) port 9443.
[4/6/22, 12:15:24:425 UTC] 00000034 VirtualHostIm A   CWWKT0017I: Web application removed (default_host): https://localhost:9443/fhir-server/api/v4/
[4/6/22, 12:15:24:426 UTC] 00000034 VirtualHostIm A   CWWKT0017I: Web application removed (default_host): https://localhost:9443/jwt/
[4/6/22, 12:15:24:426 UTC] 00000034 VirtualHostIm A   CWWKT0017I: Web application removed (default_host): https://localhost:9443/fhir-bulkdata-webapp/
[4/6/22, 12:15:24:427 UTC] 00000034 VirtualHostIm A   CWWKT0017I: Web application removed (default_host): https://localhost:9443/openapi/ui/
[4/6/22, 12:15:24:429 UTC] 00000034 VirtualHostIm A   CWWKT0017I: Web application removed (default_host): https://localhost:9443/fhir-openapi/
[4/6/22, 12:15:24:430 UTC] 00000034 VirtualHostIm A   CWWKT0017I: Web application removed (default_host): https://localhost:9443/ibm/api/
[4/6/22, 12:15:24:431 UTC] 00000034 VirtualHostIm A   CWWKT0017I: Web application removed (default_host): https://localhost:9443/openapi/
[4/6/22, 12:15:54:387 UTC] 0000005a RuntimeUpdate W   CWWKE1102W: The quiesce operation did not complete. The server will now stop.
[4/6/22, 12:15:54:389 UTC] 0000005a RuntimeUpdate W   CWWKE1106W: 1 shutdown operations did not complete during the quiesce period. 
[4/6/22, 12:15:54:419 UTC] 00000034 AppMessageHel A   CWWKZ0009I: The application fhir-openapi has stopped successfully.
[4/6/22, 12:15:54:454 UTC] 00000057 AppMessageHel A   CWWKZ0009I: The application fhir-bulkdata-webapp has stopped successfully.
[4/6/22, 12:15:54:532 UTC] 0000002d AppMessageHel A   CWWKZ0009I: The application fhir-server-webapp has stopped successfully.
[4/6/22, 12:15:55:541 UTC] 0000005a MpConfigProxy I   CWWKS5782I: The MicroProfile JWT version 1.2 mpConfigProxy deactivated successfully.
[4/6/22, 12:15:55:543 UTC] 0000005a MpConfigProxy I   CWWKS5777I: The MicroProfile JWT version 1.1 mpConfigProxy deactivated successfully.
[4/6/22, 12:15:55:545 UTC] 0000005a MicroProfileJ I   CWWKS5502I: The MicroProfile JWT configuration [defaultMpJwt] was successfully deactivated.
[4/6/22, 12:15:55:547 UTC] 0000005a MicroProfileJ I   CWWKS5502I: The MicroProfile JWT configuration [MicroProfileJwtService] was successfully deactivated.
[4/6/22, 12:15:55:658 UTC] 0000005a JAASServiceIm I   CWWKS1124I: The collective authentication plugin with class name NullCollectiveAuthenticationPlugin has been deactivated. 
[4/6/22, 12:15:55:663 UTC] 0000005a SecurityReady I   CWWKS0009I: The security service has stopped.
[4/6/22, 12:15:56:821 UTC] 00000001 FrameworkMana A   CWWKE0036I: The server defaultServer stopped after 2 minutes, 29.764 seconds.

Describe the solution you'd like Figure out what is holding up our shutdown and fix it.

Describe alternatives you've considered

Acceptance Criteria

  1. GIVEN a running ibm-fhir-server container (use -i and -t unless we fix #3565) AND its not processing any requests WHEN you interupt the process (e.g. via control+c) THEN it should shut down right away (e.g. within 2 seconds)

Additional context

lmsurpre commented 2 years ago

plan: issue a kill and ask java to leave a dump, then see what threads are running.

lmsurpre commented 2 years ago

note: shutdown happens immediately if the server hasn't serviced any requests yet. but i just ran an experiment where we issues a GET /metadata and then a couple requests that errored out...after that it took the full 30 seconds to shut down.