aws-samples / aws-etl-orchestrator

A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
MIT No Attribution
330 stars 138 forks source link

Process Marketing Job not writing parquet file to S3 #4

Closed enr1c091 closed 4 years ago

enr1c091 commented 5 years ago

Hi,

I am running this sample and for some reason that I can't figure out why, the process_marketing_data.py isn't writing the output file to S3 and the Count: log in CWL returns 0. Therefore, the Join step fails since it can't infer schema to the parquet file.

liangruibupt commented 4 years ago

You should upload the sales sample data to aws-etl-orchestrator-demo-raw-data/sales and marketing sample data to aws-etl-orchestrator-demo-raw-data/marketing

For example: aws s3 ls s3://aws-etl-orchestrator-demo-raw-data --region ap-northeast-1 --profile us-east-1 --recursive 2019-12-26 17:39:42 0 marketing/ 2019-12-26 17:43:36 151746 marketing/MarketingData_QuickSightSample.csv 2019-12-26 17:42:55 0 sales/ 2019-12-26 17:43:51 2002910 sales/SalesPipeline_QuickSightSample.csv

moanany commented 4 years ago

Like @liangruibupt pointed out. Project readme updated with instructions for copying the datasets.