Closed johnsontroye1 closed 4 years ago
Hello,
This seems to be an internal issue related to AWS Glue. Could you reliably reproduce this error on subsequent runs of the ETL state machine? If so, please open a support case.
On Wed, Jul 18, 2018 at 7:01 PM Troy Johnson notifications@github.com wrote:
I have pulled down this repo and have it working until the last step (Join Marketing and Sales Data). I have tried to get past this unsuccessfully. Here's the error logged in Gluerunner CloudWatch logs:
[ERROR] 2018-07-18T15:17:26.792Z 88fb4fc4-8a9d-11e8-bec7-f7119107e998 Glue job "JoinMarketingAndSalesData" run with Run Id "jr_bebcc..." failed. Last state: FAILED. Error message: AnalysisException: u'Path does not exist: hdfs://ip-172-31-74-135.ec2.internal:8020/user/root/aa.etl-output-path/tmp/sales;'
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aws-samples/aws-etl-orchestrator/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/APzRrLDRCQcqdIZtWpIBtuo2ROSSJXSPks5uH2oFgaJpZM4VU976 .
Yes, i essentially cleaned out everything several times and reran to the same point of error. The only difference i see in the logs is different run id and ip address to the ec2. Can you please tell me where I go to open a support case for this? Thank you.
Sure, check out the instructions here:
https://docs.aws.amazon.com/awssupport/latest/user/getting-started.html#case-management
Also, I've just re-run the ETL state machine again just to be sure. The state machine completed successfully. This leaves us with either a possible internal issue with AWS Glue or a project configuration issue.
Hope this helps.
On Wed, Jul 18, 2018 at 7:36 PM Troy Johnson notifications@github.com wrote:
Yes, i essentially cleaned out everything several times and reran to the same point of error. The only difference i see in the logs is different run id and ip address to the ec2. Can you please tell me where I go to open a support case for this? Thank you.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/aws-samples/aws-etl-orchestrator/issues/1#issuecomment-406013745, or mute the thread https://github.com/notifications/unsubscribe-auth/APzRrGpB72r7DSQkTC-IdLNw5UMKMDuxks5uH3IKgaJpZM4VU976 .
There were 5 .json files in the repo that needed config changes.
Would you mind sending me your .json files so i can compare against what i have. Maybe i did mess up a configuration.
troy.johnson@changepoint.com
Thank you very much,
Troy
Hey Troy — I don’t mind at all .. I’m out of office until 8/6, so I’ll share as soon as I return
On Wed, Jul 18, 2018 at 8:51 PM Troy Johnson notifications@github.com wrote:
There were 5 .json files in the repo that needed config changes.
- cloudformation/gluerunner-lambda-params.json
- lambda/s3-deployment-descriptor.json
- cloudformation/glue-resources-params.json
- lambda/gluerunner/gluerunner-config.json
- cloudformation/step-functions-resources-params.json
Would you mind sending me your .json files so i can compare against what i have. Maybe i did mess up a configuration.
troy.johnson@changepoint.com
Thank you very much,
Troy
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/aws-samples/aws-etl-orchestrator/issues/1#issuecomment-406036501, or mute the thread https://github.com/notifications/unsubscribe-auth/APzRrI3c_8rqfPnrfM16Iqwez-roQS8oks5uH4OjgaJpZM4VU976 .
It could be the reason that a wrong parameter set in glue-resources-params.json
:
{
"ParameterKey": "S3ETLOutputPath",
"ParameterValue": "<NO-DEFAULT>"
}
Please make sure ParameterValue is indeed set to a S3 path, like:
s3://<bucket_name>/output
Not simply:
output
Because the later will actually write the result to HDFS local system! That's why the Join Marketing and Sales Data
couldn't find the file.
Config parameters and docs were updated to simplify the configuration process and make it less error prone.
I have pulled down this repo and have it working until the last step (Join Marketing and Sales Data). I have tried to get past this unsuccessfully. Here's the error logged in Gluerunner CloudWatch logs:
[ERROR] 2018-07-18T15:17:26.792Z 88fb4fc4-8a9d-11e8-bec7-f7119107e998 Glue job "JoinMarketingAndSalesData" run with Run Id "jr_bebcc..." failed. Last state: FAILED. Error message: AnalysisException: u'Path does not exist: hdfs://ip-172-31-74-135.ec2.internal:8020/user/root/aa.etl-output-path/tmp/sales;'