bstopp / puppet-aem

Puppet module for managing AEM Installations.
https://forge.puppet.com/bstopp/aem
Apache License 2.0
30 stars 30 forks source link

Discussion : Wait for installer state to be ok before continuing. #110

Open zipkid opened 6 years ago

zipkid commented 6 years ago

We often see the crx installer fail because AEM is restarting or in any other way not able to handle the necessary queries/commands. This adds a 'wait for ok to install state' in the crx installer provider. This type of 'wait' may also be needed in the other providers but possibly without/with another check than the 'Sling+OSGi+Installer.json' . This code is certainly not good to be merged but we would like to discuss where/how this could be done to ensure clean puppet runs.

Maybe this should be part of https://github.com/bstopp/crx-packmgr-api-client-gem, but that is generated from https://github.com/bstopp/swagger-aem, which i don't know how to work with.

wimsymons commented 6 years ago

This might fix #82 as well.

zipkid commented 6 years ago

@bstopp , I have updated the .rubocop.yaml -> 'TargetRubyVersion: 2.2'. Can you trigger the checks please?

bstopp commented 6 years ago

Do you have a use case or manifest set that shows this occurring? What is causing the AEM restart, a puppet change or a user initiated change?

What is being experienced right now? A number of subsequent failures?

I am pretty certain i know what the issue is, and this won't solve it; the system already does a check with retries here when the resource is encountered by Puppet for applying.

I was pretty sure i opened a ticket somewhere on the underlying issue; if i find it, i'll link it.

stevengssns commented 6 years ago

Hi @bstopp,

An exact example of what we have observed is the following:

In our setup we have a clean AEM 6.3 installation, followed by a Service Pack 1 and Cumulative Fix Pack 2 package installation. When we do a clean install, we have observed that the CFP is often (but not always) only partially installed. When going to the package manager, a substantial number of the sub-packages are still in an uninstalled state. When reproducing the issue on a local workstation, I observed that one of the package install hooks of one of the CFP sub-packages threw an exception. The exception said that the Dynamic Class Loader service was no longer available. When investigating further, it turned out that the installation of the CFP package started too soon. When the Service Pack gets installed, and the package manager API returns, then the package manager GUI will show the package to be installed, but it is actually still in progress. This means that there are still a lot of OSGi services that are being reloaded due to the ongoing installation(s), when the next package installation is already started.

To try and make the package installations more robust, we are trying to add a more reliable check on the installation state. This check is based on the Sling OSGi Installer JMX MBean which is mentioned in the following AEM Gem:

https://docs.adobe.com/content/ddc/en/gems/AEM-Sustenance---Best-Practices-for-deploying-AEM-Maintenance-Releases/_jcr_content/par/download/file.res/AEM-Sustenance-Best-Practices-Gems.pdf

In the mean while I have also learned that the following end-point provides similar information, though it is documented nowhere, and googling for it seems to return no Adobe search hits at all.

/crx/packmgr/installstatus.jsp

I've decompiled the code, and it does a very simple check on the ActiveResourceCount attribute of the Sling OSGi Installer JMX MBean being '0' or not.

I hope this clarifies the necessity for these changes, if not I can provide more info.

Fyi, there are still a issues to tackle or think about.

henrykuijpers commented 5 years ago

@bstopp any input on this?

bstopp commented 5 years ago

Can you confirm this wasn't fixed with v3.0.0?