GoogleCloudPlatform / healthcare-data-harmonization-dataflow

Apache License 2.0
35 stars 24 forks source link

ErrorEntry does not contain sufficient context and may lead to data loss #2

Closed jaketf closed 4 years ago

jaketf commented 4 years ago

The ErrorEntry model should contain the original data element not just the error message to facilitate easy debugging by looking at the DLQ with all the context "this element failed this step with this error at this time" rather than "this step failed with this error at this time".

Right now if there is an error, there will be data loss. Consider either re-using or drawing inspiration from HealthcareIOError

This also opens the door for a "error catchup" pipeline that reads the DLQ and takes the appropriate action to successfully handle those data elements.

lastomato commented 4 years ago

The original message names are recorded as part of commit a77ce20362949bd975febd79e1b649021545a377. The reason I think we need another abstraction is that HealthcareIOError is HCLS specific, while MappingFn is general purpose, it can be used for other types of mapping as well.