apache / incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
https://devlake.apache.org/
Apache License 2.0
2.61k stars 529 forks source link

[Question][AzureDevOps] Pipeline run retention #7788

Closed benjaminrmoss closed 1 month ago

benjaminrmoss commented 4 months ago

Hi - The ADO project I'm working with has a pipeline rention policy of 3 previous runs. When running the data collection in DevLake, any runs that have expired or been deleted from ADO are removed, which leads to the DORA dashboard having no/less data.

Is there any way for me to configure DevLake to retain any pipeline metrics even if the actual pipeline run in ADO has been removed? Thanks for your help.

dosubot[bot] commented 4 months ago

Hey @benjaminrmoss! I'm a bot here to help you out with bugs, questions, and more. Let's tackle this issue together!

Yes, DevLake allows for the retention of pipeline run data independently of the source system's retention policies through its synchronization policy settings. You can configure the sync frequency, time range, and the skip-on-fail option for your data, which helps in managing how and when data is collected and retained [1][2].

Additionally, DevLake supports different data collection modes, including "Collect Data (Default)," "Collect Data in Full Refresh Mode," and "Re-transform Data," which provide flexibility in how data is managed and retained [1][2].

For ensuring that historical pipeline run data is preserved, you can follow these best practices:

  1. Set Up a Blueprint with a Regular Sync Schedule: Configure your blueprint for Azure DevOps data collection with a regular sync schedule to automatically trigger historical runs at specified intervals.
  2. Manual Execution of Historical Runs: Manually trigger historical runs via the Configuration UI or API for ad-hoc data collection or backfilling data.
  3. Monitor and Manage Historical Runs: Regularly monitor historical runs to ensure successful execution and address any errors promptly.
  4. Use the API for Advanced Control: Utilize the DevLake API for more granular control over data collection and transformation processes [3].

By following these practices, you can ensure that historical pipeline run data is consistently preserved in DevLake, even if the actual pipeline runs in Azure DevOps have been removed.

To continue talking to Dosu, mention @dosu.

d4x1 commented 4 months ago

@benjaminrmoss You can run your project in incremental mode. It will only collect data that changes therotically since last successful sync.

benjaminrmoss commented 4 months ago

@d4x1 Thanks for the reply. Would I set this in the project configuration settings? I can't see any option for it here: image Maybe this feature isn't supported for ADO yet? https://devlake.apache.org/docs/Overview/SupportedDataSources/#azure-devops

d4x1 commented 4 months ago

@benjaminrmoss Yes Azuredevops is writen with Python, and doesn't support incremental mode. You can try azuredevop_go, it can collect data incremantally.

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] commented 1 month ago

This issue has been closed because it has been inactive for a long time. You can reopen it if you encounter the similar problem in the future.