Closed wersly closed 2 years ago
Hi @wersly and thanks for submitting this PR!
I feel like users wouldn't want these streams enabled by default as they might inadvertently land secrets in their data warehouse.
So what do you think about making them opt-in in the tap configuration with:
CONFIG = {
'api_url': "https://gitlab.com/api/v4",
'private_token': None,
'start_date': None,
'groups': '',
'ultimate_license': False,
'fetch_merge_request_commits': False,
'fetch_pipelines_extended': False,
'fetch_group_variables': False,
'fetch_project_variables': False,
}
...
STREAM_CONFIG_SWITCHES = (
'merge_request_commits',
'pipelines_extended',
'group_variables',
'project_variables',
)
...
CONFIG['ultimate_license'] = truthy(CONFIG['ultimate_license'])
CONFIG['fetch_merge_request_commits'] = truthy(CONFIG['fetch_merge_request_commits'])
CONFIG['fetch_pipelines_extended'] = truthy(CONFIG['fetch_pipelines_extended'])
CONFIG['fetch_group_variables'] = truthy(CONFIG['fetch_group_variables'])
CONFIG['fetch_project_variables'] = truthy(CONFIG['fetch_project_variables'])
Hi @edgarrmondragon - wonderful idea, thanks for catching that!
Your suggested config/code looks good to me. I'll get around to implementing and testing this all for you soon.
Alright @edgarrmondragon , got around to implementing and testing your suggestions. It all looks good to me!
I did the following tests:
Ran meltano elt tap-gitlab target-jsonl
with the following meltano.yml
config snippet:
extractors:
- name: tap-gitlab
pip_url: git+https://github.com/wersly/tap-gitlab.git@load-variables
config:
api_url: ***
private_token: ***
groups: some/group
projects: some/group/project
start_date: '1970-01-01T00:00:00Z'
ultimate_license: true
fetch_merge_request_commits: false
fetch_pipelines_extended: false
Result: default false
values for fetch_group_variables
and fetch_project_variables
were assumed; no group or project variables were extracted / these streams were skipped.
Ran meltano elt tap-gitlab target-jsonl
with the following meltano.yml
config snippet:
extractors:
- name: tap-gitlab
pip_url: git+https://github.com/wersly/tap-gitlab.git@load-variables
config:
api_url: ***
private_token: ***
groups: some/group
projects: some/group/project
start_date: '1970-01-01T00:00:00Z'
ultimate_license: true
fetch_merge_request_commits: false
fetch_pipelines_extended: false
fetch_group_variables: false
fetch_project_variables: false
Result: specified configuration is applied; no group or project variables were extracted / these streams were skipped.
Ran meltano elt tap-gitlab target-jsonl
with the following meltano.yml
config snippet:
extractors:
- name: tap-gitlab
pip_url: git+https://github.com/wersly/tap-gitlab.git@load-variables
config:
api_url: ***
private_token: ***
groups: some/group
projects: some/group/project
start_date: '1970-01-01T00:00:00Z'
ultimate_license: true
fetch_merge_request_commits: false
fetch_pipelines_extended: false
fetch_group_variables: true
fetch_project_variables: true
Result: specified configuration is applied; group and project variables were successfully extracted.
I also updated the README describing these new pieces of configuration and their motivations. Let me know if the language I've used there is sufficient, or if you would prefer something else be written.
Kudos, SonarCloud Quality Gate passed!
0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells
No Coverage information
0.0% Duplication
Thanks @edgarrmondragon !
How was this code tested?
This code was tested locally with a
meltano.yml
file to the effect of: (note, some fields are redacted or replaced with meaningless values for privacy)And the following meltano operations were performed:
In all cases, the project_variables and group_variables were loaded to their targets with the provided schemas. In the case of interactions with databases (sqlite, bigquery) tables were appropriately truncated per the default replication methods for project_variables and group_variables when meltano operations were performed multiple times. Likewise, dbt was able to run on the data without regression.
Please let me know if there are any additional tests or modifications you'd like me to run on this.
Risks, Tradeoffs, Backwards Compatibility Issues
None that I can really see. The
sync_variables
function is essentially a copy-paste from thesync_labels
function (very similar pattern in the GitLab API between Group/Project labels and Group/Project variables), so any risks assumed there are also assumed here.While it is not a tradeoff, I would like to point out the
key_properties
I've selected for the group and project variables - the GitLab API does not assign any sort ofid
field to these data. So instead, the project/group id (assigned by thesync_variables
function) and thekey
(from GitLab) are taken together to form a compound key. Variablekey
s must be unique within GitLab CI/CD Variables for a single Project or Group, but of course they can be duplicated across Projects/Groups. So the combination of Group/Project ID and variablekey
seems like the correct natural key for this data to me.See: