Closed MacHu-GWU closed 5 years ago
@MacHu-GWU if we move to kinesis, will we still need this?
I think we still needed it in a different way. If the Kinesis ETL pipeline is broken, we still need something to trigger it for missing date time range after we fixed it.
If an ETL system is not 100% stable, we need this REDO things anyway.
ok, that's helpful
For example, every time the current ETL pipeline brokes, devops folks like Andy needs to spend lots of time on REDO the missing data. It should be automated.
migrated to jira
User story
I would like to improve the data quality in Redshift.
Now the ETL pipeline is triggered by S3 object creation event. If the parser failed on a data file, we leave nothing in the Hot Bucket. We need another worker to trigger the parser for undone data file.
The reason that I don't recommend to add retry to the parser is that
sometimes retry is just not able to make it better
.Notes
What is the value to the user in this story?*
Avoid missing data in Redshift.
This worker provides the flexibility of doing undone data file at any time we need.
Acceptance Criteria
Tasks to complete the story
Definition of Done