MeltanoLabs / tap-gitlab

Singer.io Tap for extracting data from Gitlab's API
GNU Affero General Public License v3.0
8 stars 25 forks source link

`KeyError: 'updated_at'` on `merge_requests` and `pipelines` #96

Closed francispotter closed 1 year ago

francispotter commented 1 year ago

With groups, projects, branches, and project_members in the select part of meltano.yml, everything seems to work fine. But when I add merge_requests or pipelines fields, an error returns on some projects.

Here's a clip from meltano.yml:

        - merge_requests.title
        - pipelines.status

Here's a clip from the output with the error message:

2023-03-07T22:14:25.280853Z [info     ]   File "/Users/francispotter/git/fpotter/labcrawler/.meltano/extractors/tap-gitlab/venv/lib/python3.10/site-packages/tap_gitlab/__init__.py", line 463, in
sync_merge_requests cmd_type=elb consumer=False name=tap-gitlab producer=True stdio=stderr string_id=tap-gitlab
2023-03-07T22:14:25.281084Z [info     ]     utils.update_state(STATE, state_key, row['updated_at']) cmd_type=elb consumer=False name=tap-gitlab producer=True stdio=stderr string_id=tap-gitlab
2023-03-07T22:14:25.281445Z [info     ] KeyError: 'updated_at'         cmd_type=elb consumer=False name=tap-gitlab producer=True stdio=stderr string_id=tap-gitlab
2023-03-07T22:14:25.395743Z [error    ] Extractor failed
2023-03-07T22:14:25.396429Z [error    ] Block run completed.           block_type=ExtractLoadBlocks err=RunnerError('Extractor failed') exit_codes={<PluginType.EXTRACTORS: 'extractors'>: 1} set_number=0
success=False

I can see in REQUESTS (the "catalog"?) that merge_requests and pipelines contain this line:

        'replication_keys': ['updated_at'],

... which makes sense to me, and I expect it explains why the tap is querying for those values anyway. But why the failure? When I try one of the queries in Postman, the records all seem to return values for updated_at:

Here's the query:

https://gitlab.com/api/v4/projects/28179884/pipelines?updated_after=2000-01-01T00:00:00Z

The response includes:

        "updated_at": "2021-07-15T21:57:55.970Z",

... which looks like a valid ISO date to me.

With apologies if this belongs in Slack or reflects (newbie) user error, is there a bug here?

francispotter commented 1 year ago

Oops, never mind! I now see that if I include updated_at in meltano.yml for those tables, it works.