aws-samples / aws-big-data-blog-dmscdc-walkthrough

MIT No Attribution
32 stars 16 forks source link

Failed in the InitController, any idea? #1

Closed java8964 closed 2 years ago

java8964 commented 4 years ago

I follow your blog page in the aws account, but I failed to create the DMSCDCSource CF stack. Basically it failed in the "InitController" step after "InitDMS" and "Trigger" all created successfully.

The log in the CloudWater is not very helpful in this case. Here are the log entries, any idea what is the root cause?

[INFO] 2020-01-10T00:59:21.253Z 5a69a2bb-65f9-404a-8471-fd83596094d9 { "RequestType": "Create", "ServiceToken": "arn:aws:lambda:us-east-1:366397055506:function:DMSCDC_InitController", "ResponseURL": "https://cloudformation-cust...", "StackId": "arn:aws:cloudformation:us-east-1:366397055506:stack/DMSCDCSource-RDS-Mysql-1/1b3c49b0-3344-11ea-878c-0e8805686543", "RequestId": "7859e6ae-6bb4-469a-b1f3-1b0d12a7d198", "LogicalResourceId": "InitController", "ResourceType": "Custom::InitController", "ResourceProperties": { "ServiceToken": "arn:aws:lambda:us-east-1:366397055506:function:DMSCDC_InitController", "dmsBucket": "yzhang-dmscdc-dms", "lakePath": "", "lakeBucket": "yzhang-dmscdc-datalake", "lakeDBName": "cdcpoc", "dmsPath": "cdc_poc/" } }

[INFO] 2020-01-10T00:59:51.722Z 5a69a2bb-65f9-404a-8471-fd83596094d9 Error during Controller execution

https://cloudformation-cust...

{ "Status": "FAILED", "Reason": "See the details in CloudWatch Log Stream: 2020/01/10/[$LATEST]49129bd60fb7480b8bbef36f2bf7c45c", "PhysicalResourceId": "2020/01/10/[$LATEST]49129bd60fb7480b8bbef36f2bf7c45c", "StackId": "arn:aws:cloudformation:us-east-1:366397055506:stack/DMSCDCSource-RDS-Mysql-1/1b3c49b0-3344-11ea-878c-0e8805686543", "RequestId": "7859e6ae-6bb4-469a-b1f3-1b0d12a7d198", "LogicalResourceId": "InitController", "NoEcho": false, "Data": { "Data": "Error during Controller execution" } }

sheridan06 commented 4 years ago

@java8964 did you ever get this resolved? I'm having the same exact issue with this CF template. I builds the stack fine up until the InitDMS custom resource. After a while (~15-20 minutes), the stack fails with "Custom Resource failed to stabilize in expected time"

thanks

sheridan06 commented 4 years ago

@java8964 I figured this out yesterday. After the first CF stack builds, you need to check the DMS Replication Instance for the correct VPC Security Groups and confirm that both endpoints can connect through this replication instance. Only then should you build the second CF stack and it will get past the InitDMS custom resource. HOWEVER, this second CF stack is now failing on InitController and I haven't yet figured out why. I think it might be because the DynamoDB table was never created by the first CF template, but not sure yet

java8964 commented 4 years ago

Hi, Brad:

Thanks for the detail information.

Our target is really to Snowflake, for DMS CDC output, instead of to the DataLake (S3 + Glue). So we decided to use our own solution for S3 -> Snowflake, but still depending on DMS/CDC.

I never made the whole CloudFormation working for me, but we are fine as now.

Thanks

Yong


From: Brad Sheridan notifications@github.com Sent: Tuesday, April 28, 2020 9:23 AM To: aws-samples/aws-big-data-blog-dmscdc-walkthrough aws-big-data-blog-dmscdc-walkthrough@noreply.github.com Cc: Yong Zhang java8964@hotmail.com; Mention mention@noreply.github.com Subject: Re: [aws-samples/aws-big-data-blog-dmscdc-walkthrough] Failed in the InitController, any idea? (#1)

@java8964https://github.com/java8964 I figured this out yesterday. After the first CF stack builds, you need to check the DMS Replication Instance for the correct VPC Security Groups and confirm that both endpoints can connect through this replication instance. Only then should you build the second CF stack and it will get past the InitDMS custom resource. HOWEVER, this second CF stack is now failing on InitController and I haven't yet figured out why. I think it might be because the DynamoDB table was never created by the first CF template, but not sure yet

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/aws-samples/aws-big-data-blog-dmscdc-walkthrough/issues/1#issuecomment-620604666, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAK5EUVXU7J643TUP7DLYFDRO3KD7ANCNFSM4KFA5PVA.

ashokballolli commented 3 years ago

@sheridan06 @java8964 any luck? are you guys able to solve this issue? Facing exactly the same issue, the VPC security groups configuration looks good as the endpoints connections status is successful.

sheridan06 commented 3 years ago

@ashokballolli we did indeed solve this issue but can't recall what exactly we did. I even just pinged some of the other engineers on the team and they too can't remember. This was when we were just starting to test out the pipeline and weren't documenting at the time. sorry

ashokballolli commented 3 years ago

@sheridan06 thank you very much for the reply. The issue has been resolved. Just waited for the DMS job(very first time) to create the required folders in the bucket and files then triggered the glue job to populate the data into datalake.

sheridan06 commented 3 years ago

That's great to hear @ashokballolli !!

java8964 commented 3 years ago

Sorry I just found out this email chain.

Our destination is from DMS to Snowflake, so I didn't use this sample any more, instead developing our own solution to get the DMS output to Snowflake.

In the end, our experience shows that for RDB source data, snowflake is much better than DataLake to finish data integration requirements.

Just curious, what the issue for this CloudFormation setup?

Yong


From: Brad Sheridan notifications@github.com Sent: Monday, December 14, 2020 7:11 PM To: aws-samples/aws-big-data-blog-dmscdc-walkthrough aws-big-data-blog-dmscdc-walkthrough@noreply.github.com Cc: Yong Zhang java8964@hotmail.com; Mention mention@noreply.github.com Subject: Re: [aws-samples/aws-big-data-blog-dmscdc-walkthrough] Failed in the InitController, any idea? (#1)

That's great to hear @ashokballollihttps://github.com/ashokballolli !!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/aws-samples/aws-big-data-blog-dmscdc-walkthrough/issues/1#issuecomment-744858165, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAK5EUSYSAYGVPGPDFQJT6TSU2SRLANCNFSM4KFA5PVA.

sheridan06 commented 3 years ago

@java8964 the issue was a failure of InitDMS in one of the CF templates, but I don't recall exactly what the solution was

thanks, Brad

jeffgardnerdev commented 3 years ago

@java8964 There are some kinks in the original lambda function that cause the CloudFormation CustomResource to timeout when there is an error with the Glue job, since the error handling in the Lambda isn't configured to return the error properly to CloudFormation. Makes the stack creation spin for about an hour before it times out and then it can be hard to diagnose the issue. The error I encountered was I didn't supply the DMSFolder and LakeFolder parameters (which are marked as optional in the DMSCDC_CloudTemplate_Source.yaml CloudFormation template) and the Glue jobs throw an error when they are not supplied.

I created a PR to address this and a couple of other issues I found when going through the steps in the blog post. Not sure if/when it will get merged but you can try using my fork in the meantime if you'd like.

rjvgupta commented 2 years ago

FYI, just released an update to the code which has better error handling. I addition i added VPC parameters to the Reusable CFN template allowing users to specify where the replication instance is deployed.

nk7983 commented 2 years ago

I already have a DMS replication tasks that's writing to an S3 bucket so I have tried to modify the template DMSCDC_CloudTemplate_Source.yaml by removing all references to DMS tasks but still it keeps failing at the controller execution. Why is the controller getting executed during Cloudformation creation process?

Error is as below. Any help is highly appreciated. Response shows 200

{ "Status": "FAILED", "Reason": "See the details in CloudWatch Log Stream: 2022/05/30/[$LATEST]339bedc6306f4c1ea80d59a1e7638447", "PhysicalResourceId": "2022/05/30/[$LATEST]339bedc6306f4c1ea80d59a1e7638447", "StackId": "arn:aws:cloudformation:us-east-1:725337377563:stack/DMSCDCReusable/a6c1a430-dfd1-11ec-b195-0acc1b8a7a61", "RequestId": "bbe6e972-b101-45e7-b7b5-8a687d5039a0", "LogicalResourceId": "InitController", "NoEcho": false, "Data": { "Data": "Error during Controller execution" } }

rjvgupta commented 2 years ago

Please see the glue log. You might get some more details there.