MeltanoLabs / tap-gitlab

Singer.io Tap for extracting data from Gitlab's API
GNU Affero General Public License v3.0
8 stars 25 forks source link

When `inclusion: available` properties are not selected, target fails because properties are not nullable according to the schema #47

Closed pnadolny13 closed 2 years ago

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Feb 10, 2021, 10:59

As seen in https://gitlab.com/meltano/meltano/-/jobs/1019512436

commits.short_id is not selected, but since the field is not nullable per https://gitlab.com/meltano/tap-gitlab/-/blob/master/tap_gitlab/schemas/commits.json#L11, target-postgres fails. Other streams likely have the same problem.

We should either exclude unselected properties from the SCHEMA message, or update the schema to allow non-automatic fields to be null.

This ~bug was introduced by https://gitlab.com/meltano/tap-gitlab/-/merge_requests/34, released as v0.9.12.

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Feb 10, 2021, 11:07

changed the description

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Feb 10, 2021, 11:09

mentioned in merge request !35

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Feb 10, 2021, 11:09

mentioned in commit 602e89fba0d4c3f31307ea5fe03a2d3f37dbab3b

pnadolny13 commented 2 years ago

In GitLab by @rabidaudio on Feb 10, 2021, 12:39

This actually doesn't break on meltano's target-postgres which is what I have been testing on. But I switch to the datamill variant and recreated the issue.

I'm looking for an example of how other taps solve this problem. I can't find anywhere in singer-python where the schema is filtered by the metadata

pnadolny13 commented 2 years ago

In GitLab by @rabidaudio on Feb 10, 2021, 12:46

meltano's target-postgres seems to create the short_id column and then set it to an empty string. Perhaps that counts as a bug on the target

pnadolny13 commented 2 years ago

In GitLab by @rabidaudio on Feb 10, 2021, 13:01

I looked at the way a lot of other taps deal with this and it seems like the behavior is to always specify a column as optional.

This is a kind of breaking change to the tap, but targets are expected to deal with schema changes, and making a non-nullable column nullable is a straightforward, backwards-compatible schema change for databases

pnadolny13 commented 2 years ago

In GitLab by @rabidaudio on Feb 10, 2021, 13:18

I went ahead and made all fields nullable. We could possibly get away with keeping automatic (key property) fields as non-nullable, but official singer taps seem to make even key properties nullable

pnadolny13 commented 2 years ago

In GitLab by @rabidaudio on Feb 10, 2021, 13:19

mentioned in merge request !36

pnadolny13 commented 2 years ago

In GitLab by @rabidaudio on Feb 10, 2021, 13:44

This behavior is somewhat unexpected to me, I would normally expect that if I unselected a column that it wouldn't appear in my database, instead of appearing as a null. But either we should request a Singer spec change or make filtering the schema messages to only the selected columns a feature of Meltano

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Feb 10, 2021, 16:08

I went ahead and made all fields nullable.

Sounds good.

make filtering the schema messages to only the selected columns a feature of Meltano

I agree: https://gitlab.com/meltano/meltano/-/issues/2475

In the new Singer SDK we'll also handle this "correctly" from the start.

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Feb 11, 2021, 12:14

mentioned in commit 95103ec271ac9372815b8c99c474e62d64b0708b