apache / incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
https://devlake.apache.org/
Apache License 2.0
2.5k stars 493 forks source link

[Question][CI/CD] How can we stop auto collection of CI/CD metrics? #7375

Open thiDucTran opened 2 months ago

thiDucTran commented 2 months ago

Question

Hi, what is the correct config to use if we do not want to automatically collect deployment/builds/etc data? Referencing https://devlake.apache.org/docs/DataModels/DevLakeDomainLayerSchema#data-models, I have tried not choosing CI/CD for my data scope config...but deployment data is still being collected. Using v1beta5 with azure devops GO connection

FYI: i asked this also in slack. see https://devlake-io.slack.com/archives/C03APJ20VM4/p1714025708040129

Screenshots

image image

klesh commented 2 months ago

@thiDucTran Hi, by design, the previously collected data of selected entities won't be deleted but simply skipping those related subtasks, please check if the collectApiBuilds subtask is showing in the pipeline plan JSON for the plugin image

thiDucTran commented 2 months ago

i dont see collectApiBuilds

    [
      {
        "plugin": "azuredevops_go",
        "subtasks": [
          "collectAccounts",
          "collectApiPullRequests",
          "convertRepo",
          "extractAccounts",
          "convertAccounts",
          "extractApiPullRequests",
          "collectApiPullRequestCommits",
          "convertApiBuilds",
          "convertApiPullRequests",
          "convertPrLabels",
          "extractApiPullRequestCommits",
          "convertApiPullRequestsCommits",
          "convertApiTimelineRecords"
        ],
thiDucTran commented 2 months ago

idk if this is an issue..and if it is a separate issue that needs its own github issue. but sharing it again from the slack thread

whenever I do a new pipeline run..i see that it changed updated_at for all of the pipeline runs to the same time (see before and after picture)

although, what is this data used for..i do not think it is used to calculate DORA? because when I go to the DORA - Deployment frequency dashboard...it's empty (as expected)...so there seems to be a difference between pipeline runs that you see in Azure DevOps dashboard versus deployments that you would see in DORA dashboards_

image image

klesh commented 2 months ago

Weird, why there is a convertApiBuilds in the subtasks list. It looks like a bug indeed, would you like to file it in a separate issue and we will look into it?

abeizn commented 2 months ago

@thiDucTran Already effective in v1.0.0-beta6

thiDucTran commented 2 months ago

issue is not resolved for me in v1.0.0-beta6 ? CI/CD metrics still gets collected... I even deleted the project...purge scope's data...re-created project..made sure CI/CD is not in my scope config...colleted data..and I still see ci/cd metrics in Azure DevOps dashboard

        "plugin": "azuredevops_go",
        "subtasks": [
          "collectAccounts",
          "collectApiPullRequests",
          "convertRepo",
          "extractAccounts",
          "convertAccounts",
          "extractApiPullRequests",
          "collectApiPullRequestCommits",
          "convertApiPullRequests",
          "convertPrLabels",
          "extractApiPullRequestCommits",
          "convertApiPullRequestsCommits",
          "convertApiTimelineRecords"
        ],

i think part of the issue is that mysql data is not really removed? I still see records like this after deleting the project, clear data scope historical data, and even removing the data scope...

SELECT
  *
FROM 
  cicd_pipelines

image

abeizn commented 2 months ago

@thiDucTran What is the value of your environment variable ENABLE_SUBTASKS_BY_DEFAULT?

klesh commented 2 months ago

@thiDucTran That is weird, I don't see any related subtasks in the list. Can you check the raw tables and see if the data gets purged?

thiDucTran commented 2 months ago

i edited my previous comment....seems data is not purged

klesh commented 2 months ago

To be investigated.

thiDucTran commented 1 month ago

Hi, would I need to create another issue for the table cicd_deployment_commits ? I have deleted the project and data scope..but data of deleted project is still there in cicd_deployment_commits ..using 1.0.0-beta9

edit: also seeing unpurged data in table cicd_deployments ..i mean there could be other tables with unpurged data as well