MeltanoLabs / tap-gitlab

Singer.io Tap for extracting data from Gitlab's API
GNU Affero General Public License v3.0
8 stars 25 forks source link

What do you think to add option to `tap-gitlab` to disable extract Job ressources? #41

Closed pnadolny13 closed 2 years ago

pnadolny13 commented 2 years ago

In GitLab by @stephane-klein on Sep 30, 2020, 10:01

What do you think to add option to tap-gitlab to disable extract Job ressources?

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Sep 30, 2020, 13:02

@stephane-klein For sure, I'll gladly accept a contribution to add a setting like fetch_jobs with a default value of True (for backward compatibility), similar to the fetch_merge_request_commits and fetch_pipelines_extended settings that are already supported.

Ideally though, we'd implement https://gitlab.com/meltano/tap-gitlab/-/issues/25 instead, so that we can support Meltano's native entity selection functionality: https://meltano.com/docs/integration.html#selecting-entities-and-attributes-for-extraction, which is built on Singer's Discovery Mode: https://github.com/singer-io/getting-started/blob/master/docs/DISCOVERY_MODE.md.

Is that something you'd be interested in looking into? :)

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Sep 30, 2020, 13:03

marked this issue as related to #25

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Feb 10, 2021, 10:44

Closed in https://gitlab.com/meltano/tap-gitlab/-/issues/25!

pnadolny13 commented 2 years ago

In GitLab by @stephane-klein on Apr 24, 2021, 12:42

@DouweM I try to use tap-gitlab and disable jobs fetching.

$ /src/.venv/tap-gitlab/bin/tap-gitlab -c /config/tap_gitlab_config.json --discover > /config/catalog.json

Next, I execute:

/src/.venv/tap-gitlab/bin/tap-gitlab -c /config/tap_gitlab_config.json --catalog /config/catalog.json
INFO Starting sync
INFO Skipping stream: projects
INFO Skipping stream: branches
INFO Skipping stream: commits
INFO Skipping stream: issues
INFO Skipping stream: jobs
INFO Skipping stream: merge_requests
INFO Skipping stream: merge_request_commits
INFO Skipping stream: project_milestones
INFO Skipping stream: group_milestones
INFO Skipping stream: users
INFO Skipping stream: groups
INFO Skipping stream: project_members
INFO Skipping stream: group_members
INFO Skipping stream: releases
INFO Skipping stream: tags
INFO Skipping stream: project_labels
INFO Skipping stream: group_labels
INFO Skipping stream: epics
INFO Skipping stream: epic_issues
INFO Skipping stream: pipelines
INFO Skipping stream: pipelines_extended
INFO GET https://gitlab.spacefill.fr/api/v4/groups/spacefill
INFO GET https://gitlab.spacefill.fr/api/v4/projects/54?statistics=1
INFO GET https://gitlab.spacefill.fr/api/v4/projects/10?statistics=1
INFO GET https://gitlab.spacefill.fr/api/v4/projects/1?statistics=1
{"type": "STATE", "value": {"project_54": "2018-01-01T00:00:00Z", "project_10": "2018-01-01T00:00:00Z", "project_1": "2018-01-01T00:00:00Z"}}
INFO Sync complete

@DouweM do you know why all streams are skipped?

I install tap-gitlab with:

pip install git+https://gitlab.com/meltano/tap-gitlab.git@v0.9.15

I read this documentation and I think "selected": true missing.

This is my catalog.json with some "selected": true addition.

Do you know why there are no "selected-by-default": true property in catalog.json?

Best regards,
Stéphane

pnadolny13 commented 2 years ago

In GitLab by @stephane-klein on Apr 24, 2021, 14:44

I read this documentation and I think "selected": true missing.

Yes, is that.

$ /src/.venv/tap-gitlab/bin/tap-gitlab -c /config/tap_gitlab_config.json --discover > /config/catalog-orig.json
$ /config/catalog-orig.json /config/catalog.json

Next I updated /config/catalog.json and this is the diff:

@@ -2,7 +2,6 @@
   "streams": [
     {
       "tap_stream_id": "projects",
-      "selected": true,
       "replication_key": [
         "last_activity_at"
       ],
@@ -423,7 +422,6 @@
       "metadata": [
         {
           "breadcrumb": [],
-          "selected": true,
           "metadata": {
             "table-key-properties": [
               "id"
@@ -432,8 +430,7 @@
             "valid-replication-keys": [
               "last_activity_at"
             ],
-            "inclusion": "available",
-            "selected": true
+            "inclusion": "available"
           }
         },
         {
@@ -883,8 +880,7 @@
               "name"
             ],
             "forced-replication-method": "FULL_TABLE",
-            "inclusion": "available",
-            "selected": false
+            "inclusion": "available"
           }
         },
         {
@@ -1128,8 +1124,7 @@
             "valid-replication-keys": [
               "created_at"
             ],
-            "inclusion": "available",
-            "selected": true
+            "inclusion": "available"
           }
         },
         {
@@ -1499,7 +1494,6 @@
               "id"
             ],
             "forced-replication-method": "INCREMENTAL",
-            "selected": true,
             "valid-replication-keys": [
               "updated_at"
             ],
@@ -1954,8 +1948,7 @@
               "id"
             ],
             "forced-replication-method": "FULL_TABLE",
-            "inclusion": "available",
-            "selected": false
+            "inclusion": "available"
           }
         },
         {
@@ -2393,7 +2386,6 @@
             "valid-replication-keys": [
               "updated_at"
             ],
-            "selected": true,
             "inclusion": "available"
           }
         },
@@ -2789,7 +2781,6 @@
               "commit_id"
             ],
             "forced-replication-method": "FULL_TABLE",
-            "selected": false,
             "inclusion": "unsupported"
           }
         },
@@ -2927,8 +2918,7 @@
               "id"
             ],
             "forced-replication-method": "FULL_TABLE",
-            "inclusion": "available",
-            "selected": true
+            "inclusion": "available"
           }
         },
         {
@@ -3128,7 +3118,6 @@
               "id"
             ],
             "forced-replication-method": "FULL_TABLE",
-            "selected": true,
             "inclusion": "available"
           }
         },
@@ -3289,7 +3278,6 @@
               "id"
             ],
             "forced-replication-method": "FULL_TABLE",
-            "selected": true,
             "inclusion": "available"
           }
         },
@@ -3351,7 +3339,6 @@
     },
     {
       "tap_stream_id": "groups",
-      "selected": true,
       "replication_method": "FULL_TABLE",
       "key_properties": [
         "id"
@@ -3457,7 +3444,6 @@
               "id"
             ],
             "forced-replication-method": "FULL_TABLE",
-            "selected": true,
             "inclusion": "available"
           }
         },
@@ -3566,8 +3552,7 @@
             "projects"
           ],
           "metadata": {
-            "inclusion": "available",
-            "selected": true
+            "inclusion": "available"
           }
         }
       ]
@@ -3618,7 +3603,6 @@
               "id"
             ],
             "forced-replication-method": "FULL_TABLE",
-            "selected": true,
             "inclusion": "available"
           }
         },
@@ -3730,7 +3714,6 @@
               "id"
             ],
             "forced-replication-method": "FULL_TABLE",
-            "selected": true,
             "inclusion": "available"
           }
         },
@@ -3890,7 +3873,6 @@
               "tag_name"
             ],
             "forced-replication-method": "FULL_TABLE",
-            "selected": true,
             "inclusion": "available"
           }
         },
@@ -4022,7 +4004,6 @@
               "name"
             ],
             "forced-replication-method": "FULL_TABLE",
-            "selected": true,
             "inclusion": "available"
           }
         },
@@ -4167,7 +4148,6 @@
               "id"
             ],
             "forced-replication-method": "FULL_TABLE",
-            "selected": true,
             "inclusion": "available"
           }
         },
@@ -4363,7 +4343,6 @@
               "id"
             ],
             "forced-replication-method": "FULL_TABLE",
-            "selected": true,
             "inclusion": "available"
           }
         },
@@ -4596,8 +4575,7 @@
             "valid-replication-keys": [
               "updated_at"
             ],
-            "inclusion": "available",
-            "selected": true
+            "inclusion": "available"
           }
         },
         {
@@ -4748,7 +4726,6 @@
     },
     {
       "tap_stream_id": "epic_issues",
-      "selected": true,
       "replication_method": "FULL_TABLE",
       "key_properties": [
         "group_id",
@@ -4813,8 +4790,7 @@
               "epic_issue_id"
             ],
             "forced-replication-method": "FULL_TABLE",
-            "inclusion": "available",
-            "selected": true
+            "inclusion": "available"
           }
         },
         {
@@ -4960,8 +4936,7 @@
             "valid-replication-keys": [
               "updated_at"
             ],
-            "inclusion": "available",
-            "selected": false
+            "inclusion": "available"
           }
         },
         {
@@ -5195,8 +5170,7 @@
               "id"
             ],
             "forced-replication-method": "FULL_TABLE",
-            "inclusion": "unsupported",
-            "selected": true
+            "inclusion": "unsupported"
           }
         },
         {
@@ -5355,4 +5329,4 @@
       ]
     }
   ]
-}
+}

This is complete catalog.json

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Apr 26, 2021, 11:28

@stephane-klein Correct, selected: true needs to be set explicitly on any streams you'd like to import. If you're using Meltano, it can handle the catalog generation and stream selection for you automatically: https://meltano.com/docs/integration.html#selecting-entities-and-attributes-for-extraction

pnadolny13 commented 2 years ago

In GitLab by @stephane-klein on Apr 26, 2021, 13:32

Correct, selected: true needs to be set explicitly on any streams you'd like to import.

@DouweM thanks, can we add note in README about that? An example?

If you're using Meltano, it can handle the catalog generation and stream selection for you automatically: https://meltano.com/docs/integration.html#selecting-entities-and-attributes-for-extraction

Ok 🙂

pnadolny13 commented 2 years ago

In GitLab by @DouweM on Apr 30, 2021, 12:50

can we add note in README about that? An example?

@stephane-klein I think that's a good idea and I've created an issue for it: https://gitlab.com/meltano/tap-gitlab/-/issues/30#note_560784934, but I'm afraid I don't currently have time to do it myself.

Since you just went through it (running --discovery, adding selected: true where appropriate, passing in with--catalog), please feel free to submit a merge request to add this to [README.md`](./README.md)!

Its-Alex commented 2 years ago

For information purpose, selected: true must be put into metadata, in the object where breadcrumb is an empty array