airbytehq / PyAirbyte

PyAirbyte brings the power of Airbyte to every Python developer.
https://docs.airbyte.com/pyairbyte
Other
216 stars 31 forks source link

Fix: Resolve issue where `read()` would fail if it received unexpected/undeclared top-level properties in a stream #131

Closed aaronsteers closed 6 months ago

aaronsteers commented 6 months ago

The primary cause for failure in the case of PokeAPI source was a JSON field called cries was included in the delivered data, but not in the source's schema.

The core fix of this PR is to ensure that our base implementation does not attempt to write fields which are undeclared in the source schema.

Notes:

aaronsteers commented 6 months ago

It looks like at least one root cause of failure is that source-pokeapi retrieves a stream property called cries which is not contained in the catalog file.

Error while reading data, error message: JSON parsing error in row starting at position 0: No such field: cries.

https://github.com/airbytehq/airbyte/blob/44f784e200fd66d43e0f948858aa2a2ee184fef7/docs/connector-development/tutorials/cdk-speedrun-assets/pokemon.json

aaronsteers commented 6 months ago

I did more digging and found that CI is failing the pokemon integration test, but inexplicably so.

In CI, it uses the incorrect endpoint and gets a 404:

https://pokeapi.co/api/v2/pikachu

Instead of the correct endpoint:

https://pokeapi.co/api/v2/pokemon/pikachu

image

It's not worth debugging at this point since the fix for the core issue is already contained in this PR. I'll open a low-pri issue to resolve. In the meanwhile, we'll just skip this particular test when run in CI. (And if it continues to act up, we can just remove that test entirely.)

cc @bindipankhudi

Update: Logged as: https://github.com/airbytehq/PyAirbyte/issues/146