MeltanoLabs / tap-gitlab

Singer.io Tap for extracting data from Gitlab's API
GNU Affero General Public License v3.0
8 stars 25 forks source link

Extract Pipelines and Jobs for a project #20

Closed pnadolny13 closed 2 years ago

pnadolny13 commented 2 years ago

In GitLab by @iroussos on Nov 12, 2019, 07:42

We are currently not extracting a project's Pipelines and Jobs.

  1. Pipelines can be fetched through the Pipelines API end point
  2. Jobs can be fetched through the Jobs API end point

Jobs reference the pipeline they belong to, so we get the relation for free.

What we should investigate is how to incrementally extract the data vs fetching all pipelines and jobs since the start of a project (which could get really costly).

Pipelines provide an updated_at attribute that could be used as a cutoff point, but there is no option in the API call similar to other API end points (e.g. updated_after) or a sorting option.

We should check for those options and whether the pipelines are returned in updated_at descending order. That would allow us to first fetch the pipelines that have been updated after the provided start_date or (the timestamp provided in the STATE file for that project's pipelines) and then only fetch the jobs for those pipelines by using the List pipeline jobs call.

Otherwise if there is no way to incrementally fetch Pipelines and Jobs, we should make this Entity optional, similarly to what we are doing with fetching the Commits for a Merge Request: add a configuration parameter similar to fetch_merge_request_commits that will allow users to manually set this option for some extractions.

Useful Resources when working with Gitlab's API:

  1. Gitlab API Documentation
  2. Gitlab API entities
  3. Entity Schemas
pnadolny13 commented 2 years ago

In GitLab by @iroussos on Nov 12, 2019, 08:21

A followup when this issue is completed can be to add the Transformations necessary for those two new tables.

This can be done in the dbt package with the Transformations for gitlab data.

pnadolny13 commented 2 years ago

In GitLab by @tomekzbrozek on May 6, 2020, 09:38

there is no option in the API call similar to other API end points (e.g. updated_after) or a sorting option

@iroussos it seems that the endpoint reference has changed and updated_after is now available for pipelines! https://docs.gitlab.com/ee/api/pipelines.html#list-project-pipelines

pnadolny13 commented 2 years ago

In GitLab by @iroussos on May 6, 2020, 16:51

That's great news and implementing this feature will be way simpler :-)

pnadolny13 commented 2 years ago

In GitLab by @tomekzbrozek on May 7, 2020, 06:47

just opened an MR for that: https://gitlab.com/meltano/tap-gitlab/-/merge_requests/23

pnadolny13 commented 2 years ago

In GitLab by @DouweM on May 15, 2020, 12:11

closed