MeltanoLabs / tap-gitlab

Singer.io Tap for extracting data from Gitlab's API
GNU Affero General Public License v3.0
11 stars 29 forks source link

tap-gitlab - sync_pipelines_extended is failing due to difference in schema #79

Open amalkumarCurve opened 2 years ago

amalkumarCurve commented 2 years ago

Error: Traceback (most recent call last): File "/Users/amalkumar/venv/bin/tap-gitlab", line 11, in load_entry_point('tap-gitlab==0.9.15', 'console_scripts', 'tap-gitlab')() File "/Users/amalkumar/venv/lib/python3.7/site-packages/tap_gitlab/init.py", line 959, in main raise exc File "/Users/amalkumar/venv/lib/python3.7/site-packages/tap_gitlab/init.py", line 956, in main main_impl() File "/Users/amalkumar/venv/lib/python3.7/site-packages/tap_gitlab/init.py", line 951, in main_impl do_sync() File "/Users/amalkumar/venv/lib/python3.7/site-packages/tap_gitlab/init.py", line 904, in do_sync sync_group(gid, pids) File "/Users/amalkumar/venv/lib/python3.7/site-packages/tap_gitlab/init.py", line 679, in sync_group sync_project(pid) File "/Users/amalkumar/venv/lib/python3.7/site-packages/tap_gitlab/init.py", line 834, in sync_project sync_pipelines(data) File "/Users/amalkumar/venv/lib/python3.7/site-packages/tap_gitlab/init.py", line 723, in sync_pipelines sync_pipelines_extended(project, transformed_row) File "/Users/amalkumar/venv/lib/python3.7/site-packages/tap_gitlab/init.py", line 744, in sync_pipelines_extended transformed_row = transformer.transform(row, RESOURCES[entity]["schema"], mdata) File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer/transform.py", line 152, in transform raise SchemaMismatch(self.errors) singer.transform.SchemaMismatch: Errors during transform user: data does not match {'type': 'object', 'properties': {'name': {'type': 'string'}, 'username': {'type': 'string'}, 'id': {'type': 'integer'}, 'state': {'type': 'string'}}} committed_at: data does not match {'type': 'string', 'format': 'date-time'} coverage: data does not match {'type': 'number'} : data does not match {'type': 'object', 'properties': {'project_id': {'type': ['integer', 'null']}, 'id': {'type': ['integer', 'null']}, 'status': {'type': ['string', 'null']}, 'ref': {'type': ['string', 'null']}, 'sha': {'type': ['string', 'null']}, 'before_sha': {'type': ['string', 'null']}, 'tag': {'type': ['boolean', 'null']}, 'yaml_errors': {'type': ['string', 'null']}, 'user': {'type': 'object', 'properties': {'name': {'type': 'string'}, 'username': {'type': 'string'}, 'id': {'type': 'integer'}, 'state': {'type': 'string'}}}, 'created_at': {'anyOf': [{'type': 'string', 'format': 'date-time'}, {'type': 'null'}]}, 'updated_at': {'anyOf': [{'type': 'string', 'format': 'date-time'}, {'type': 'null'}]}, 'started_at': {'anyOf': [{'type': 'string', 'format': 'date-time'}, {'type': 'null'}]}, 'finished_at': {'anyOf': [{'type': 'string', 'format': 'date-time'}, {'type': 'null'}]}, 'committed_at': {'anyOf': [{'type': 'string', 'format': 'date-time'}, {'type': 'null'}]}, 'duration': {'anyOf': [{'type': 'integer'}, {'type': 'null'}]}, 'coverage': {'anyOf': [{'type': 'number'}, {'type': 'null'}]}, 'web_url': {'type': ['string', 'null']}}}

Errors during transform: [user: data does not match {'type': 'object', 'properties': {'name': {'type': 'string'}, 'username': {'type': 'string'}, 'id': {'type': 'integer'}, 'state': {'type': 'string'}}}, committed_at: data does not match {'type': 'string', 'format': 'date-time'}, coverage: data does not match {'type': 'number'}, : data does not match {'type': 'object', 'properties': {'project_id': {'type': ['integer', 'null']}, 'id': {'type': ['integer', 'null']}, 'status': {'type': ['string', 'null']}, 'ref': {'type': ['string', 'null']}, 'sha': {'type': ['string', 'null']}, 'before_sha': {'type': ['string', 'null']}, 'tag': {'type': ['boolean', 'null']}, 'yaml_errors': {'type': ['string', 'null']}, 'user': {'type': 'object', 'properties': {'name': {'type': 'string'}, 'username': {'type': 'string'}, 'id': {'type': 'integer'}, 'state': {'type': 'string'}}}, 'created_at': {'anyOf': [{'type': 'string', 'format': 'date-time'}, {'type': 'null'}]}, 'updated_at': {'anyOf': [{'type': 'string', 'format': 'date-time'}, {'type': 'null'}]}, 'started_at': {'anyOf': [{'type': 'string', 'format': 'date-time'}, {'type': 'null'}]}, 'finished_at': {'anyOf': [{'type': 'string', 'format': 'date-time'}, {'type': 'null'}]}, 'committed_at': {'anyOf': [{'type': 'string', 'format': 'date-time'}, {'type': 'null'}]}, 'duration': {'anyOf': [{'type': 'integer'}, {'type': 'null'}]}, 'coverage': {'anyOf': [{'type': 'number'}, {'type': 'null'}]}, 'web_url': {'type': ['string', 'null']}}}]

Steps to reproduce: API version: 0.9.15 & 0.10.0 Python: 3.7.3

Config: { "api_url": "https://", "private_token": "", "groups": "", "projects": "", "start_date": "", "ultimate_license": true, "fetch_merge_request_commits": true, "fetch_pipelines_extended": true }

Command: tap-gitlab --config tap-gitlab-config

laurentS commented 2 years ago

Hi @amalkumarCurve it looks like you're using version 0.9.15 of the tap, which is from the legacy-stable branch and is not actively maintained anymore. Is that correct?

If so, can you try switching to version 2 or directly using code from the main branch?

amalkumarCurve commented 2 years ago

Hi @laurentS ,

Thanks for your response.

I tried using version 2 i.e. (https://github.com/MeltanoLabs/tap-gitlab/releases/tag/v2.0.0-alpha4). However, getting a different issue now. i.e. 403 Client Error: Forbidden for path: /groups/{group_id}/variables .

Since in my config I disabled the flag fetch_group_variables, Ideally path /groups/{group_id}/variables should be ignored. Isn't?

config { "api_url": "api/v4", "private_token": "", "groups": "", "projects": "", "start_date": "2022-04-23T00:00:00Z", "ultimate_license": true, "fetch_merge_request_commits": false, "fetch_pipelines_extended": false, "fetch_group_variables": false, "fetch_project_variables": false }

Installation logs: ~ ❯ pip install git+https://github.com/MeltanoLabs/tap-gitlab.git@v2.0.0-alpha4 Collecting git+https://github.com/MeltanoLabs/tap-gitlab.git@v2.0.0-alpha4 Cloning https://github.com/MeltanoLabs/tap-gitlab.git (to revision v2.0.0-alpha4) to /private/var/folders/gw/j33zqy31447crctz4n5ct4nm0000gp/T/pip-req-build-h_ylqe3v Running command git clone --filter=blob:none --quiet https://github.com/MeltanoLabs/tap-gitlab.git /private/var/folders/gw/j33zqy31447crctz4n5ct4nm0000gp/T/pip-req-build-h_ylqe3v Running command git checkout -q 7d285b80617ef85b0e0a5c0c32000252bbbc9962 Resolved https://github.com/MeltanoLabs/tap-gitlab.git to commit 7d285b80617ef85b0e0a5c0c32000252bbbc9962 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Collecting requests<3.0.0,>=2.25.1 Using cached requests-2.27.1-py2.py3-none-any.whl (63 kB) Collecting requests-cache<0.10.0,>=0.9.3 Downloading requests_cache-0.9.4-py3-none-any.whl (47 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.2/47.2 kB 1.5 MB/s eta 0:00:00 Collecting singer-sdk<0.5.0,>=0.4.4 Downloading singer_sdk-0.4.9-py3-none-any.whl (97 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.3/97.3 kB 3.2 MB/s eta 0:00:00 Collecting PyYAML<7.0,>=6.0 Using cached PyYAML-6.0-cp37-cp37m-macosx_10_9_x86_64.whl (189 kB) Collecting charset-normalizer~=2.0.0 Using cached charset_normalizer-2.0.12-py3-none-any.whl (39 kB) Collecting idna<4,>=2.5 Using cached idna-3.3-py3-none-any.whl (61 kB) Collecting certifi>=2017.4.17 Using cached certifi-2022.5.18.1-py3-none-any.whl (155 kB) Collecting urllib3<1.27,>=1.21.1 Using cached urllib3-1.26.9-py2.py3-none-any.whl (138 kB) Collecting attrs<22.0,>=21.2 Using cached attrs-21.4.0-py2.py3-none-any.whl (60 kB) Collecting appdirs<2.0.0,>=1.4.4 Using cached appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB) Collecting url-normalize<2.0,>=1.4 Downloading url_normalize-1.4.3-py2.py3-none-any.whl (6.8 kB) Collecting cattrs<2.0,>=1.8 Using cached cattrs-1.10.0-py3-none-any.whl (29 kB) Collecting memoization<0.4.0,>=0.3.2 Downloading memoization-0.3.2-py3-none-any.whl (38 kB) Collecting importlib-metadata Using cached importlib_metadata-4.11.4-py3-none-any.whl (18 kB) Collecting inflection<0.6.0,>=0.5.1 Using cached inflection-0.5.1-py2.py3-none-any.whl (9.5 kB) Collecting cryptography<4.0.0,>=3.4.6 Using cached cryptography-3.4.8-cp36-abi3-macosx_10_10_x86_64.whl (2.0 MB) Collecting PyJWT<3.0,>=2.3 Using cached PyJWT-2.4.0-py3-none-any.whl (18 kB) Collecting joblib<2.0.0,>=1.0.1 Using cached joblib-1.1.0-py2.py3-none-any.whl (306 kB) Collecting pendulum<3.0.0,>=2.1.0 Using cached pendulum-2.1.2-cp37-cp37m-macosx_10_15_x86_64.whl (124 kB) Collecting backoff<2.0,>=1.8.0 Downloading backoff-1.11.1-py2.py3-none-any.whl (13 kB) Collecting sqlalchemy<2.0,>=1.4 Downloading SQLAlchemy-1.4.36-cp37-cp37m-macosx_10_14_x86_64.whl (1.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 6.7 MB/s eta 0:00:00 Collecting jsonpath-ng<2.0.0,>=1.5.3 Downloading jsonpath_ng-1.5.3-py3-none-any.whl (29 kB) Collecting pipelinewise-singer-python==1.2.0 Downloading pipelinewise_singer_python-1.2.0-py3-none-any.whl (24 kB) Collecting click<9.0,>=8.0 Using cached click-8.1.3-py3-none-any.whl (96 kB) Collecting pytz<2021.0 Downloading pytz-2020.5-py2.py3-none-any.whl (510 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 510.8/510.8 kB 7.2 MB/s eta 0:00:00 Collecting simplejson==3.11.1 Using cached simplejson-3.11.1.tar.gz (78 kB) Preparing metadata (setup.py) ... done Collecting python-dateutil>=2.6.0 Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB) Collecting backoff<2.0,>=1.8.0 Using cached backoff-1.8.0-py2.py3-none-any.whl (45 kB) Collecting jsonschema==3.2.0 Using cached jsonschema-3.2.0-py2.py3-none-any.whl (56 kB) Collecting ciso8601 Using cached ciso8601-2.2.0.tar.gz (18 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Collecting six>=1.11.0 Using cached six-1.16.0-py2.py3-none-any.whl (11 kB) Requirement already satisfied: setuptools in ./venv/lib/python3.7/site-packages (from jsonschema==3.2.0->pipelinewise-singer-python==1.2.0->singer-sdk<0.5.0,>=0.4.4->tap-gitlab==2.0.0a3) (47.1.0) Collecting pyrsistent>=0.14.0 Using cached pyrsistent-0.18.1-cp37-cp37m-macosx_10_9_x86_64.whl (68 kB) Collecting typing_extensions Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB) Collecting cffi>=1.12 Using cached cffi-1.15.0-cp37-cp37m-macosx_10_9_x86_64.whl (178 kB) Collecting decorator Using cached decorator-5.1.1-py3-none-any.whl (9.1 kB) Collecting ply Using cached ply-3.11-py2.py3-none-any.whl (49 kB) Collecting pytzdata>=2020.1 Using cached pytzdata-2020.1-py2.py3-none-any.whl (489 kB) Collecting greenlet!=0.4.17 Using cached greenlet-1.1.2-cp37-cp37m-macosx_10_14_x86_64.whl (92 kB) Collecting zipp>=0.5 Using cached zipp-3.8.0-py3-none-any.whl (5.4 kB) Collecting pycparser Using cached pycparser-2.21-py2.py3-none-any.whl (118 kB) Using legacy 'setup.py install' for simplejson, since package 'wheel' is not installed. Building wheels for collected packages: tap-gitlab, ciso8601 Building wheel for tap-gitlab (pyproject.toml) ... done Created wheel for tap-gitlab: filename=tap_gitlab-2.0.0a3-py3-none-any.whl size=21146 sha256=6410d523f7367cf3786c4b206c5a64f2ede319ed4b03d5b909887d6a98ad0144 Stored in directory: /private/var/folders/gw/j33zqy31447crctz4n5ct4nm0000gp/T/pip-ephem-wheel-cache-a4uroqmw/wheels/5b/ca/99/3b7c339fc8f1f786201eff9b6308a99f4bb0ad88125e223682 Building wheel for ciso8601 (pyproject.toml) ... done Created wheel for ciso8601: filename=ciso8601-2.2.0-cp37-cp37m-macosx_10_9_x86_64.whl size=13177 sha256=65b5f29ee3a084dd01bd451cd8b890160c18c487e931e97904160e369ea09d05 Stored in directory: /Users/amalkumar/Library/Caches/pip/wheels/ad/25/8f/3b0a82303191efe3c1204f3741c42d8eb2b0236567e22485de Successfully built tap-gitlab ciso8601 Installing collected packages: simplejson, pytz, ply, ciso8601, appdirs, zipp, urllib3, typing_extensions, six, PyYAML, pytzdata, pyrsistent, PyJWT, pycparser, memoization, joblib, inflection, idna, greenlet, decorator, charset-normalizer, certifi, backoff, attrs, url-normalize, requests, python-dateutil, jsonpath-ng, importlib-metadata, cffi, cattrs, sqlalchemy, requests-cache, pendulum, jsonschema, cryptography, click, pipelinewise-singer-python, singer-sdk, tap-gitlab Running setup.py install for simplejson ... done Successfully installed PyJWT-2.4.0 PyYAML-6.0 appdirs-1.4.4 attrs-21.4.0 backoff-1.8.0 cattrs-1.10.0 certifi-2022.5.18.1 cffi-1.15.0 charset-normalizer-2.0.12 ciso8601-2.2.0 click-8.1.3 cryptography-3.4.8 decorator-5.1.1 greenlet-1.1.2 idna-3.3 importlib-metadata-4.11.4 inflection-0.5.1 joblib-1.1.0 jsonpath-ng-1.5.3 jsonschema-3.2.0 memoization-0.3.2 pendulum-2.1.2 pipelinewise-singer-python-1.2.0 ply-3.11 pycparser-2.21 pyrsistent-0.18.1 python-dateutil-2.8.2 pytz-2020.5 pytzdata-2020.1 requests-2.27.1 requests-cache-0.9.4 simplejson-3.11.1 singer-sdk-0.4.9 six-1.16.0 sqlalchemy-1.4.36 tap-gitlab-2.0.0a3 typing_extensions-4.2.0 url-normalize-1.4.3 urllib3-1.26.9 zipp-3.8.0

Error: time=2022-05-30 11:11:38 name=tap-gitlab level=INFO message=Tap has custom mapper. Using 1 provided map(s). {"type": "SCHEMA", "stream": "group_variables", "schema": {"properties": {"group_id": {"type": ["null", "integer"]}, "variable_type": {"type": ["null", "string"]}, "key": {"type": ["null", "string"]}, "value": {"type": ["null", "string"]}, "protected": {"type": ["null", "boolean"]}, "masked": {"type": ["null", "boolean"]}, "environment_scope": {"type": ["null", "string"]}}, "type": "object"}, "key_properties": ["project_id", "key"]} time=2022-05-30 11:11:38 name=tap-gitlab level=INFO message=INFO METRIC: {'type': 'timer', 'metric': 'http_request_duration', 'value': 0.096695, 'tags': {'endpoint': '/groups/{group_id}/variables', 'http_status_code': 403, 'status': 'failed', 'url': '/api/v4/groups/71/variables', 'context': {'group_path': 'tech', 'group_id': 71}}} Traceback (most recent call last): File "/Users/amalkumar/venv/bin/tap-gitlab", line 8, in sys.exit(TapGitLab.cli()) File "/Users/amalkumar/venv/lib/python3.7/site-packages/click/core.py", line 1130, in call return self.main(args, kwargs) File "/Users/amalkumar/venv/lib/python3.7/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/Users/amalkumar/venv/lib/python3.7/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/Users/amalkumar/venv/lib/python3.7/site-packages/click/core.py", line 760, in invoke return __callback(args, *kwargs) File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer_sdk/tap_base.py", line 499, in cli tap.sync_all() File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer_sdk/tap_base.py", line 379, in sync_all stream.sync() File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer_sdk/streams/core.py", line 1020, in sync self._sync_records(context) File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer_sdk/streams/core.py", line 962, in _sync_records self._sync_children(child_context) File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer_sdk/streams/core.py", line 1025, in _sync_children child_stream.sync(context=child_context) File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer_sdk/streams/core.py", line 1020, in sync self._sync_records(context) File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer_sdk/streams/core.py", line 946, in _sync_records for record_result in self.get_records(current_context): File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer_sdk/streams/rest.py", line 424, in get_records for record in self.request_records(context): File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer_sdk/streams/rest.py", line 322, in request_records resp = decorated_request(prepared_request, context) File "/Users/amalkumar/venv/lib/python3.7/site-packages/backoff/_sync.py", line 94, in retry ret = target(args, **kwargs) File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer_sdk/streams/rest.py", line 235, in _request self.validate_response(response) File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer_sdk/streams/rest.py", line 165, in validate_response raise FatalAPIError(msg) singer_sdk.exceptions.FatalAPIError: 403 Client Error: Forbidden for path: /groups/{group_id}/variables

laurentS commented 2 years ago

Indeed! I believe this line https://github.com/MeltanoLabs/tap-gitlab/blob/7d285b80617ef85b0e0a5c0c32000252bbbc9962/tap_gitlab/tap.py#L163 should read as (note the not):

if stream_name in OPTIN_STREAM_NAMES and not self.config.get( 

Can you try this out and let me know if it solves your problem?

amalkumarCurve commented 2 years ago

@laurentS I Tried that and now getting this error:

Traceback (most recent call last): File "/Users/amalkumar/venv/bin/tap-gitlab", line 8, in sys.exit(TapGitLab.cli()) File "/Users/amalkumar/venv/lib/python3.7/site-packages/click/core.py", line 1130, in call return self.main(args, kwargs) File "/Users/amalkumar/venv/lib/python3.7/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/Users/amalkumar/venv/lib/python3.7/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/Users/amalkumar/venv/lib/python3.7/site-packages/click/core.py", line 760, in invoke return __callback(args, **kwargs) File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer_sdk/tap_base.py", line 499, in cli tap.sync_all() File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer_sdk/tap_base.py", line 380, in sync_all stream.finalize_state_progress_markers() File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer_sdk/streams/core.py", line 907, in finalize_state_progress_markers child_stream.finalize_state_progress_markers() File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer_sdk/streams/core.py", line 907, in finalize_state_progress_markers child_stream.finalize_state_progress_markers() File "/Users/amalkumar/venv/lib/python3.7/site-packages/singer_sdk/streams/core.py", line 910, in finalize_state_progress_markers for context in self.partitions or [{}]: File "/Users/amalkumar/venv/lib/python3.7/site-packages/tap_gitlab/client.py", line 171, in partitions "Could not detect partition type for Gitlab stream " ValueError: Could not detect partition type for Gitlab stream 'epic_issues' (/groups/{group_id}/epics/{epic_iid}/issues). Expected a URL path containing '{project_path}' or '{group_path}'.

aaronsteers commented 2 years ago

Related: I've found that the legacy version of this tap failed silently when access was denied on a number of stream types. I've started #78 which would give the new 2.x edition ability to ignore access denied issues when met.