Closed sherifnada closed 2 years ago
@sherifnada what are the cases we want to fail on? Are the lists below right?
fail on these:
don't fail on these:
@sherifnada is SAT already testing for this? is our main goal to detect drift? or just catch bugs?
@cgardens SAT makes a best effort to test this. However in cases where sandbox accounts aren’t comprehensive it’s not possible to test this for all fields in all streams. So this is to catch both bugs and drift from the API.
Your above list is correct in that it should only fail if a field is present and doesn’t match its declared type. No fields should be considered required.
If this validation fails the failure should be attributed to the source.
perfect. thanks!
It would also be incredibly helpful for debugging if, upon failure, we log:
Grooming notes:
Are there implementation details that need to be spec'd out as well as the performance considerations?
@sherifnada can you give us context into how much customer pain and OC time this is causing? For context, we are trying to reason about what performance tradeoffs we are willing to take in this implementation. The more pain it is causing the more tolerant we are of performance hit.
Note from most recent backlog grooming:
@lmossman that sounds like a great path forward
Example bug I saw: https://github.com/airbytehq/airbyte/issues/9775
Tell us about the problem you're trying to solve
If a source incorrectly declares its schema (e.g; it says the "ID" column is a number when it's really a string) then we only find out about that when the destination fails upon encountering one such record. This has two problems:
Potentially related to https://github.com/airbytehq/airbyte-internal-issues/issues/2507
Describe the solution you’d like
I would like the Airbyte worker to validate all record schemas before passing them in the destination. If a record mismatches the schema, fail the sync and attribute the failure to the source.
Describe the alternative you’ve considered or used
Steps