Adobe-Consulting-Services / acs-aem-commons

http://adobe-consulting-services.github.io/acs-aem-commons/
Apache License 2.0
453 stars 600 forks source link

OnDeploy Scripts on Cloud Service #2324

Open kaushalmall opened 4 years ago

kaushalmall commented 4 years ago

Required Information

Although the OnDeploy Scripts show up in the Build Image logs as being executed for Cloud Service deployments, most of the use cases that OnDeploy Scripts would be useful for would be considered as CS anti-patterns. Creating this issue to track the work for marking OnDeploy Scripts as incompatible for Cloud Service and discussing providing an alternative solution that in CS compatible in the future.

@HitmanInWis @davidjgonzalez @justinedelson @badvision @joerghoh @adamcin

HitmanInWis commented 4 years ago

Looping in @adamcin

HitmanInWis commented 4 years ago

Would it make sense to briefly summarize what is meant by "most of the use cases that OnDeploy Scripts would be used for fall under CS anti-patterns" in order to facilitate the discussion? Mark Adamcin and I were literally discussing the other day whether On-Deploy Scripts can still be a thing on AEMaaCS, but given the newness of the platform it might make sense to ensure we're all on the same page.

kaushalmall commented 4 years ago

Hi Brett, the biggest anti-pattern is that it will try to update content during a code deployment and since every deployment is a "new" image some of things might not work as expected. For example, if I have a script that runs a query to find all nodes of RT1 and wants to change them to RT2.

I'd recommend using Sling Pipes for above use case in CS. Repoinit is another alternative to do the work that OnDeploy Scripts do.

HTH.

HitmanInWis commented 4 years ago

Yep, sounds like we're on the same page then. One of the most common reasons we've used On-Deploy Scripts is to update node structures to match refactored code - one of the common examples being a refactored component with a new property naming structure. Given that both the new instance being started up (with the new code) and the existing instance not yet spun down (with the old code) are pointing to the same content data source during the blue/green deployment process, we really can't handle this case with a simple JCR content update since the update will break the currently running server, which is still running until the deploy completes, and will remain running (with the updated content that breaks it) if the deployment fails.

HitmanInWis commented 4 years ago

Is there a period of time where both the newly spinning up servers (new code) and the old servers (old code) are serving end users at the same time? Or is it a clean 100% cut from all traffic going to old servers to all traffic going to new servers?

kaushalmall commented 4 years ago

AFAIK, it's a clean 100% cutover. @justinedelson @davidjgonzalez can correct if needed.

kaushalmall commented 4 years ago

are we ok marking it as incompatible for now and maybe figure out if we want to update it for a future release later?

HitmanInWis commented 4 years ago

probably the safest bet

Throwing a (potentially bad) idea out there. If there is truly a 100% cutover (no period of time where both new/old servers are serving users) could we somehow trigger the jobs to run at the point of cutover? Maybe not a "great" solution, but in an imperfect world it might be fine for many use cases.

royteeuwen commented 4 years ago

Seeing these questions and responses, then what is a recommended approach for updating content? Because just saying this framework wont work doesn't make the usecase invalid? The content might still need to change resource type, so for me OnDeploy Scripts still feel valid?

On 5 Jun 2020, at 18:37, Brett Birschbach notifications@github.com wrote:

probably the safest bet

Throwing a (potentially bad) idea out there. If there is truly a 100% cutover (no period of time where both new/old servers are serving users) could we somehow trigger the jobs to run at the point of cutover? Maybe not a "great" solution, but in an imperfect world it might be fine for many use cases.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Adobe-Consulting-Services/acs-aem-commons/issues/2324#issuecomment-639617507, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABF42TWDBHQKYFKWV3MDKDLRVENN3ANCNFSM4NTBRIRA.

kaushalmall commented 4 years ago

@royteeuwen we should use Sling Pipes for updating content.

royteeuwen commented 4 years ago

How does this change anything? on deploy scripts can also use sling pipes. sling pipes is just another framework to update content based on jcr queries etc

On 5 Jun 2020, at 18:59, kaushalmall notifications@github.com wrote:

@royteeuwen https://github.com/royteeuwen we should use Sling Pipes for updating content.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Adobe-Consulting-Services/acs-aem-commons/issues/2324#issuecomment-639631880, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABF42TUFVGG6XAYM32LM2Y3RVEQA5ANCNFSM4NTBRIRA.

kaushalmall commented 4 years ago

The goal is to not change content as part of the code deployment, Sling Pipes are used on the environment by calling the end point via cURL or similar and not as part of the code deployment. There are also timeouts set in the build image step for CS deployments, if your OnDeploy scripts takes a long time, it will break the build.

shsteimer commented 4 years ago

I believe the suggested approach would be to use a resource decorator to wrap from an old component model to a new one. This can be in the interim until you can execute a sling pipe to update existing content (or perhaps even use your decorator to trigger your sling pipe asynchronously, you can avoid having to remember to call the sling pipe via curl)

davidjgonzalez commented 4 years ago

As @shsteimer notes, the suggested approach to perform "(otherwise) breaking content changes" is:

0a) [CODE] Code handles "old" content structure 0b) [CONTENT] Content is using "old" structure 1) [CODE] Update code to handle BOTH the old content and new content structures and deploy.

Now, the running code handles BOTH old and new, but is running against the "old" content structure.

2) [CONTENT] Update the old content structure to the new structure (how this happens isn't important for this discussion) 3) [CODE] If needed, remove the old content support from the code, since all content should be transformed into the "new" structure.

... @shsteimer RD being an example of this high-level approach

kaushalmall commented 4 years ago

created https://github.com/Adobe-Consulting-Services/adobe-consulting-services.github.io/pull/196

HitmanInWis commented 4 years ago

Taking that approach @davidjgonzalez means that On-Deploy Scripts could still be used for Step 2, right? That's assuming it is done in a second release - it needs to be done in a release where the code to support both old and new format is already deployed on the "running" server.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

HitmanInWis commented 3 years ago

Thought of a new idea today. Could on-deploy scripts continue to work in Cloud Service if they were triggered to run by a content package? My understanding is that content packages are installed at the point of traffic cut-over from blue to green in the deployment, meaning that code has already fully completed deployment and the server is fully ready to go.

Current Impl: On-Deploy Scripts run based on presence in code and absence of a status node under /var.

New, Cloud-compatible Impl: On-Deploy Scripts run based on presence in code and presence of a status node under /var where the status is "ready". A content package is used to install the status node, which occurs at the point of traffic cut-over.

As long as developers follow the principal of both content formats (pre-script and post-script) working until a subsequent release, this should work.

And for those that want to run fast and loose (and dont mind the site having errors for a minute or two) they could avoid that precaution and the site will be back in order once the content package installs the status node and the script runs. Obviously not recommended though :).

kaushalmall commented 3 years ago

@HitmanInWis not sure if this is still anti-pattern. I know for a fact that the usage of /var is being frowned upon now, especially if you are going to replicate that path, but adding code execution as part of the mutable content might be a no no as well. @davidjgonzalez thoughts? maybe Dominik would know, but, I don't know his github handle to tag.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.