enonic / xp

Enonic XP
https://enonic.com
GNU General Public License v3.0
201 stars 34 forks source link

Application fails to update properly under load #7978

Closed rymsha closed 4 years ago

rymsha commented 4 years ago

When application gets updated/reinstalled under load it may fail to run properly

Caused by: java.lang.NullPointerException: null
at com.enonic.xp.script.impl.service.ServiceRefImpl.findService(ServiceRefImpl.java:33)
at com.enonic.xp.script.impl.service.ServiceRefImpl.get(ServiceRefImpl.java:22)

ScriptExecutorManager grabs an application bundle which is invalid (for unknown reason) and uses it to get BundelContext (which is null for not started/invalid bundles)

It is easy to fix to validate BundleContext presence, but it actually doesn't solve the problem. Application must wait for the bundle to fully start.

rymsha commented 4 years ago

First part:

  1. Reinstallation (or update) of an application calls OSGi Bundle uninstall, which first tries to stop the bundle and send a STOPPING event.

  2. STOPPING event triggers a remove of the application from app-registry

  3. Millisecond after that request comes from frontend and recreates the app in app-registry (unfortunately with stop-in-progress bundle) creates javascript engine executor and awaits for configuration (lock L1) before returning it (holding the script-engine registry lock L2)

  4. Bundle gets fully uninstalled.

  5. New app-registry invalidation issued which tries to remove javascript engine executor from script-engine registry. It awaits lock L2

  6. After 10 seconds configuration lock L1 times out and installation continues normally.

  7. After installation is done bundle.start() is issued. That allows ConfigAdmin to finally issue config change/setup

This explains why sometimes Apache Felix ConfigAdmin doesn't notify about configuration. See #7967

rymsha commented 4 years ago

Second Part:

There are many variations of this issue. For instance configuration may and up being provided, but bundle for an Application becomes uninstalled. In this case application will never get into normal state and will keep throwing NullPointerException on ServiceRefImpl.findService because BundleContext.getServiceReference always return null for invalid bundle. Or Bundle gets uninstalled before javascript engine executor gets created. In this case Bundle Classlader will keep throwing IllegalStateException: The bundle is uninstalled (this one is not permanent as it may only appear before step 5 )