awslabs / sbt-aws

SaaS Builder Toolkit for AWS is a developer toolkit to implement SaaS best practices and increase developer velocity.
Apache License 2.0
118 stars 23 forks source link

TenentOnboarding: Unable to handle failure from onboarding script while handling onboardingRequest event #73

Closed junsjang closed 1 month ago

junsjang commented 1 month ago

Describe the bug

Tested from https://github.com/aws-samples/saas-reference-architecture-ecs repository, but it looks like the same bug are present in recent 0.3.5 version also.

When we make a CoreApplicationPlane object with a BashScriptJob that can raise any error (it's reasonable to expect and handle errors while provisioning resource), it is not available to handle errors from the provisioning script.

When I modify and test the scripts/provision-tenant.sh from the saas-reference-architecture-ecs samples repository to raise an error by inserting exit 1 script in the middle, the state machine exeuction of the Step Functions failed with TaskFailed error from the startProvisioningCodeBuild step and it causes a ExecutionFailed event for the state machine execution. It event don't make any events for the EventBridge that communicate between control plane and the application plane.

There are many failure events that is predefined but I wonder how many events are actively used. It would be better to give a way to make a failure event (ex CodeBuild step failure may send provisioningFailure events).

Expected Behavior

It would be better to give a way to make a failure event (ex CodeBuild step failure may send provisioningFailure events).

Current Behavior

the state machine exeuction of the Step Functions failed with TaskFailed error from the startProvisioningCodeBuild step and it causes a ExecutionFailed event for the state machine execution. It event don't make any events for the EventBridge that communicate between control plane and the application plane.

Reproduction Steps

When I modify and test the scripts/provision-tenant.sh from the saas-reference-architecture-ecs samples repository to raise an error by inserting exit 1 script in the middle.

Possible Solution

It would be better to give a way to make a failure event (ex CodeBuild step failure may send provisioningFailure events).

Additional Information/Context

No response

CDK CLI Version

2.147.1

Framework Version

0.1.12

Node.js Version

v20.15.0

OS

Amazon Linux 2023 (Cloud9)

Language

TypeScript

Language Version

3.7.5

Other information

No response

suhussai commented 1 month ago

Hi @junsjang , thanks for raising the issue!

I think this is bit of code you are referring to in the ECS reference architecture:

https://github.com/aws-samples/saas-reference-architecture-ecs/blob/c13ee3dc5e80fed8e080620877af52c59605dceb/server/lib/bootstrap-template/core-appplane-stack.ts#L56-L57

There are different ways we could implement the functionality required.

Option 1: We could make outgoingEvent accept an object such as:

{
  success: DetailType,
  failure: DetailType,
}

That would allow users to specify something like this in the BashJobRunner:

      outgoingEvent: {
        success: DetailType.PROVISION_SUCCESS,
        failure: DetailType.PROVISION_FAILURE,
      },

Option 2: Create a separate argument for the event that should be emitted in the case of a failure:

      outgoingEventSuccess: DetailType.PROVISION_SUCCESS,
      outgoingEventFailure: DetailType.PROVISION_FAILURE,

I think we need to balance flexibility with ease-of-use here. Having too many arguments would make the BashJobRunner flexible, but difficult to use, while having too few would sacrifice flexibility but would make the construct easy to use.

As I write this out, something I'm realizing is that much of this is boilerplate that we can handle within the framework. It might make sense for us to extend the BashJobRunner class to create a ProvisioningBashJobRunner that has these event fields already preset. Same for de-provisioning.

@junsjang , if you have thoughts here that you'd like to share, let us know.

@tobuck-aws , @ujwalbukka , any thoughts on this?

junsjang commented 1 month ago

Thank you for your interest in my issue. From what I understand, you don't currently see any use cases that require more events than success and failure, so using plain outgoingEventSuccess and outgoingEventFailure seems sufficient. However, packing into a single outgoingEvent field would be more open for modification and could have a separate implementation object that covers the result handling mechanism. Both approaches are okay from the user's perspective.

The approach of ProvisioningBashJobRunner seems like a good opinionated pattern that aws-sbt is driving, but it should be some kind of helper class that reduces the boilerplate of the system, in my opinion.

Please let me know if I have understood your feedback correctly or if you need any clarification.

suhussai commented 1 month ago

@junsjang , I just pushed a PR to address these changes. If you have any thoughts on the changes, let me know.

junsjang commented 4 weeks ago

It looks like your PR fixes the issue that I raised. I will try to use it and let you know if there is any issue with the fix.