MeltanoLabs / tap-gitlab

Singer.io Tap for extracting data from Gitlab's API
GNU Affero General Public License v3.0
11 stars 29 forks source link

Tap downloads all global gitlab users #91

Open HeinzBenjamin opened 2 years ago

HeinzBenjamin commented 2 years ago

Hi there!

when attempting to get a single project, tap-gitlab attempts to download ALL users of gitlab.

In my case, I'm forwarding this to the target-gsheet and fill a google sheet with the project info. This is a free gitlab account, no pro license, no custom URL. I am running the following command

tap-gitlab -c config_gitlab.json | target-gsheet -c config_gsheet.json

My config_gitlab.json looks like this

{
    "api_url": "https://gitlab.com",
    "private_token": "abcdefgandsoon",
    "groups": "mygroup",
    "projects": "mygroup/myproject",
    "start_date": "2018-01-01T00:00:00Z",
    "ultimate_license": false,
    "fetch_merge_request_commits": false,
    "fetch_pipelines_extended": false,
    "fetch_group_variables": false,
    "fetch_project_variables": false
}

Upon running the command I get this output

INFO Starting sync
INFO Skipping stream: merge_request_commits
INFO Skipping stream: epics
INFO Skipping stream: epic_issues
INFO Skipping stream: pipelines_extended
INFO GET https://gitlab.com/api/v4/users
INFO GET https://gitlab.com/api/v4/users

...after which a 'site_users' panel appears in the google sheet which is filled with endless rows of users.

Am I doing something wrong so that tap-gitlab doesn't filter users by group?

Or is there an option to skip users alltogether (I don't actually need them)

Best Benjamin

HeinzBenjamin commented 2 years ago

Okay I figured out that by design tap-gitlab pulls all site_users, as it's part of the RESOURCES, see https://github.com/MeltanoLabs/tap-gitlab/blob/legacy-stable/tap_gitlab/__init__.py#L110

However, for users of the public gitlab URL, this basically makes the tap unusable. Or am I missing something?

Best

laurentS commented 2 years ago

Hi @HeinzBenjamin which branch of this repo are you using? There was a recent v2 release from the main branch, which alters the behaviour substantially. If you're starting from scratch, I'd recommend moving to that version as the code in legacy-stable is not supported anymore.

Also, if you want to skip a stream, you probably need to alter your catalog to deselect said stream.

tldev commented 2 years ago

@laurentS - can you share documentation where it demonstrates deselecting a stream?

laurentS commented 2 years ago

@tldev You need to modify the catalog for this. See the singer spec docs https://github.com/singer-io/getting-started/blob/master/docs/DISCOVERY_MODE.md#example-2 which show an example. The selected: true|false key is the one you want to look for. If you're using the legacy-stable branch, you need to do this by hand, with the new main branch, you can use the sdk features to do this. https://github.com/meltano/sdk/blob/861dfc327aadfc4f57557b81a95063d7e6628b1c/singer_sdk/helpers/_catalog.py#L93 is the method that should help you with this. I don't think there's any good docs around this. Maybe an opportunity for a small PR :)