appsody / controller

Appsody controller running in the development container. This repo will be archived soon.
https://appsody.dev
Apache License 2.0
2 stars 12 forks source link

When APPSODY_<RUN/TEST/DEBUG>_KILL is false, and the server process doesn't exist, the server should be restarted #7

Closed kewegner closed 5 years ago

kewegner commented 5 years ago

@kylegc raised this issue. This could happen if Jane has a bug in her code that prevents the server from starting, or the server crashed unexpectedly. The controller would still be watching for Jane to fix the issue with a file change, however the ON_CHANGE action will not be enough to start the server back up. In this scenario, the controller should run the start server command instead of the ON_CHANGE command.

An alternative solution is to watch for the server process to exit, and then exit the controller and container. This would require Jane to fix the code and re-run appsody dev run to restart the container.

kewegner commented 5 years ago

The proposed solution is to check if the server is alive every time an "ONCHANGE" action occurs when the APPSODYKILL setting is false. By sending SIG 0 to the server process, one can determine whether the process is alive without actually changing the state of the process. The call will return an error if there is no process. In this case, the controller will run the command associated with APPSODY<RUN/DEBUG/TEST> action rather than the APPSODY__ON_CHANGE command as specified in the docker file.

tnixa commented 5 years ago

Test to surface the problem...

  1. appsody init java-microprofile
  2. appsody run
  3. changing src/main/java/dev/appsody/starter/health/StarterHealth.java to cause a compile error
  4. appsody stop
  5. appsody run -> shows the compile error
  6. fixing the file to be valid again -> console shows the BUILD SUCCESS but the web app doesn't reload
kewegner commented 5 years ago

This problem goes beyond just server Start (APPSODY_RUN), as Terrance points out, in java-microprofile, it is the INSTALL that fails due to the compile error, not the RUN. So I added prototype code for the on change to detect if APPSODY_INSTALL failed and do a re-run of the INSTALL action prior to running the on-change action. The java-microprofile app did work correctly after the proto-typed change.

Another thing we may want to consider is if the APPSODY_INSTALL failed, should we kill the server, even if KILL is false. (See below for possible issue here). Otherwise how do we know if the environment is really in pristine shape after doing the reinstall.

We could also consider killing the controller process when APPSODY_INSTALL fails and there is no "ON_CHANGE" action for the mode passed into the controller. Currently we would put out a warning message and continue to the "RUN" step.

We did appear to fix the original issue with the experimental fix for the Spring stack as well.

kewegner commented 5 years ago

This comment is not related to the issue removing.

kylegc commented 5 years ago

I think we need to better define and document the purpose of the APPSODY_INSTALL command.

Background

We added the APPSODY_INSTALL option to separate the npm install and npm start commands mainly because killing the npm install command would leave a bad, unrecoverable state. Thus we run npm install in the APPSODY_INSTALL and don't start the watcher until this command finishes.

APPSODY_INSTALL also gave us a cleaner dockerfile and flexibility to reuse this command in the future.

Relation to this Issue

As @kewegner points out above, the install command may impact the ability for the server to recover from errors. We need to determine in this issue if the install command fails, should we retry it later with the ON_CHANGE command?

Discussion Items

kewegner commented 5 years ago

Note: Another potential issue is this if we try to kill the server after a failed install: It appears that stopping the server, there is a 30 second wait in java-microprofile. The server still came up. [Container] [INFO] [AUDIT ] CWWKE0055I: Server shutdown requested on Friday, July 12, 2019 at 5:14 PM. The server defaultServer is shutting down. [Container] [INFO] [ERROR ] CWWKZ0002E: An exception occurred while starting the application starter-app. The exception message was: com.ibm.ws.container.service.state.StateChangeException: java.util.concurrent.RejectedExecutionException: CWWKE1202E: A task cannot be submitted because the executor managedExecutorService[DefaultManagedExecutorService] (concurrencyPolicy[defaultConcurrencyPolicy]) has been shut down.

Kyle has pointed out that the better approach would be if the install fails, don't even start the server. Which would eliminate this issue at least for this instance.

It is interesting if in the future we encountered a APPSODY_RUN_KILL=true where the RUN command had a delay.

kewegner commented 5 years ago

Design Decision: We will exit the controller process if INSTALL fails. We will deprecate the environment variable APPSODY_INSTALL in favor of APPSODY_PREP. APPSODY_INSTALL will continue to be supported in the near term.

kylegc commented 5 years ago

A couple more notes on the design decisions: