[Bug][DORA] Change failure rate regression

yoramshai commented 4 months ago

Search before asking

[X] I had searched in the issues and found no similar issues.

What happened

Data of Pagerduty is missing from the DORA board

What do you expect to happen

See the data and the correct score of "Change Failure Rate" and "Failed Deployment Recovery Time"

How to reproduce

On Dora dashboard, in some of the projects the data is missing even I'm able to see it in pagerduty dashboard.

Anything else

Looks like a regression, When I'm checking the same thing on the latest stable release "v0.21.0" the data is there for the same project while it is missing on "v1.0.0-beta11" release.

see attachments, you can to see the missing data on the latest version compare to latest stable version (same data in the data base, created from a snapshot).

Version

v1.0.0-beta11

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

Code of Conduct

[x] I agree to follow this project's Code of Conduct

dosubot[bot] commented 4 months ago

The data from PagerDuty is missing from the DORA board in the v1.0.0-beta11 release because PagerDuty is not listed as a supported data source for incidents in the provided documentation. The supported sources for incidents are Jira boards, GitHub issues, TAPD workspaces, and Zentao issues. In contrast, the v0.21.0 release might have had support for PagerDuty, which is not mentioned in the provided documentation ^[1].

For setting up PagerDuty as a data source, the steps and configurations are the same for both v0.21.0 and v1.0.0-beta11. You can follow these steps:

Add Data Connections
- Visit the Config UI at http://localhost:4000.
- Give your connection a unique name to help you identify it in the future.
- Paste your PagerDuty personal access token (PAT) in the Token field. You may make it a Read-Only token for the plugin's purposes.
- Click Test Connection. If the connection is successful, click Save Connection to add the connection.
Add Data Connection to a Project
- Create a project for PagerDuty by adding the connection created above to it.
Setting Data Scope
- Select the services you want to collect data from. PagerDuty supports the following data entities:
  - Issue Tracking: These map to PagerDuty incidents.
Adding Transformation Rules (Optional)
- Currently, this plugin does not support transformation rules, so skip this page by clicking Next Step.
Set Sync Policy
- Set the sync policy as you see fit. Note that PagerDuty can only collect data from up to 6 months prior to the present time ^[2]^[3]^[4]^[5].

To debug and trace the data flow from PagerDuty to the DORA dashboard in v1.0.0-beta11, you can follow these detailed steps:

Check Deployment Frequency:
- Ensure that the data sources for deployments are correctly configured. This includes Jenkins, GitLab CI, GitHub Action, BitBucket Pipelines, or Webhook.
- Verify the transformation definitions for deployments in the data transformations configuration.

Inspect the Data in cicd_deployment_commits:

Use the following SQL query to inspect the data in the cicd_deployment_commits table:

SELECT
pm.project_name,
IF(pm.project_name in ($project), 'This project is selected', 'Not Selected') as select_status,
IF(cdc._raw_data_table != '', cdc._raw_data_table, cdc.cicd_scope_id) as _raw_data_table,
result,
environment,
count(distinct cdc.id) as deployment_commit_count, 
count(distinct cdc.cicd_deployment_id) as deployment_count
FROM cicd_deployment_commits cdc
LEFT join project_mapping pm on cdc.cicd_scope_id = pm.row_id and pm.`table` = 'cicd_scopes'
WHERE $__timeFilter(cdc.finished_date)
GROUP BY pm.project_name, select_status, _raw_data_table, result, environment

Analyze Deployment Frequency:

Use the following SQL query to calculate the deployment frequency:

-- Metric 1: Deployment Frequency
with last_few_calendar_months as(
SELECT CAST(($__timeTo()-INTERVAL (H+T+U) DAY) AS date) day
FROM ( SELECT 0 H
        UNION ALL SELECT 100 UNION ALL SELECT 200 UNION ALL SELECT 300
    ) H CROSS JOIN ( SELECT 0 T
        UNION ALL SELECT  10 UNION ALL SELECT  20 UNION ALL SELECT  30
        UNION ALL SELECT  40 UNION ALL SELECT  50 UNION ALL SELECT  60
        UNION ALL SELECT  70 UNION ALL SELECT  80 UNION ALL SELECT  90
    ) T CROSS JOIN ( SELECT 0 U
        UNION ALL SELECT   1 UNION ALL SELECT   2 UNION ALL SELECT   3
        UNION ALL SELECT   4 UNION ALL SELECT   5 UNION ALL SELECT   6
        UNION ALL SELECT   7 UNION ALL SELECT   8 UNION ALL SELECT   9
    ) U
WHERE
    ($__timeTo()-INTERVAL (H+T+U) DAY) > $__timeFrom()
),
_production_deployment_days as(
SELECT
    cdc.cicd_deployment_id as deployment_id,
    max(DATE(cdc.finished_date)) as day
FROM cicd_deployment_commits cdc
JOIN project_mapping pm on cdc.cicd_scope_id = pm.row_id and pm.`table` = 'cicd_scopes'
WHERE
    pm.project_name in (${project})
    and cdc.result = 'SUCCESS'
    and cdc.environment = 'PRODUCTION'
GROUP BY 1
),
_days_weekly_deploy as(
SELECT
    date(DATE_ADD(last_few_calendar_months.day, INTERVAL -WEEKDAY(last_few_calendar_months.day) DAY)) as week,
    MAX(if(_production_deployment_days.day is not null, 1, null)) as weeks_deployed,
    COUNT(distinct _production_deployment_days.day) as days_deployed
FROM 
    last_few_calendar_months
    LEFT JOIN _production_deployment_days ON _production_deployment_days.day = last_few_calendar_months.day
GROUP BY week
),
_days_monthly_deploy as(
SELECT
    date(DATE_ADD(last_few_calendar_months.day, INTERVAL -DAY(last_few_calendar_months.day)+1 DAY)) as month,
    MAX(if(_production_deployment_days.day is not null, 1, null)) as months_deployed,
    COUNT(distinct _production_deployment_days.day) as days_deployed
FROM 
    last_few_calendar_months
    LEFT JOIN _production_deployment_days ON _production_deployment_days.day = last_few_calendar_months.day
GROUP BY month
),
_days_six_months_deploy AS (
SELECT
month,
SUM(days_deployed) OVER (
  ORDER BY month
  ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
) AS days_deployed_per_six_months,
COUNT(months_deployed) OVER (
  ORDER BY month
  ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
) AS months_deployed_count,
ROW_NUMBER() OVER (
  PARTITION BY DATE_FORMAT(month, '%Y-%m') DIV 6
  ORDER BY month DESC
) AS rn
FROM _days_monthly_deploy
),
_median_number_of_deployment_days_per_week_ranks as(
SELECT *, percent_rank() over(order by days_deployed) as ranks
FROM _days_weekly_deploy
),
_median_number_of_deployment_days_per_week as(
SELECT max(days_deployed) as median_number_of_deployment_days_per_week
FROM _median_number_of_deployment_days_per_week_ranks
WHERE ranks <= 0.5
),
_median_number_of_deployment_days_per_month_ranks as(
SELECT *, percent_rank() over(order by days_deployed) as ranks
FROM _days_monthly_deploy
),
_median_number_of_deployment_days_per_month as(
SELECT max(days_deployed) as median_number_of_deployment_days_per_month
FROM _median_number_of_deployment_days_per_month_ranks
WHERE ranks <= 0.5
),
_days_per_six_months_deploy_by_filter AS (
SELECT
month,
days_deployed_per_six_months,
months_deployed_count
FROM _days_six_months_deploy
WHERE rn%6 = 1
)

Sequence of Deployments and Incidents:

Use the following SQL query to get the sequence of deployments and incidents:

SELECT
pm.project_name,
cdc.finished_date,
cdc.result,
cdc.environment,
pd.incident_id,
pd.incident_status,
pd.incident_created_at,
pd.incident_resolved_at
FROM cicd_deployment_commits cdc
LEFT JOIN project_mapping pm ON cdc.cicd_scope_id = pm.row_id AND pm.`table` = 'cicd_scopes'
LEFT JOIN pagerduty_incidents pd ON cdc.cicd_scope_id = pd.cicd_scope_id
WHERE $__timeFilter(cdc.finished_date)
ORDER BY cdc.finished_date, pd.incident_created_at

Use the DORA Validation Dashboard:
- Starting from DevLake v0.18, the DORA Validation Dashboard can be found near the DORA dashboard. It can also be accessed via a direct link in the Dashboard Instruction panel in the DORA dashboard.
- This dashboard provides a step-by-step guide to check which step went wrong for all 4 of the DORA metrics: Deployment Frequency, Median Lead Time for Changes, Change Failure Rate, and Median Time to Restore Service.
Check SQL Scripts Behind Grafana Charts:
- Each chart in the Grafana dashboard has a hidden button in the top-right corner to access the context menu. Click Edit to open a more detailed view with the script that tells how exactly the data is queried.
Verify Project Mapping:
- The project_mapping table is responsible for mapping deployments, pull requests from the source code, or issues. Ensure it is filtered using either table = 'cicd_scopes', table = 'repos', or table = 'boards' when connecting to another table.
Webhooks Configuration:
- Ensure each project has its own webhook. A webhook used by multiple projects means that all the issues or deployments published by that webhook will be replicated among those projects.
Configure and Collect Data:
- Create data connections to retrieve the data from various tools such as Jira, GitHub, Jenkins, etc.
- Configure the DORA-related scope config to define deployments and incidents.
- Create a DevLake project, and associate the data connections with the project. Collect data to see DORA metrics.
View and Customize DevLake's DORA Dashboard:
- With all the data collected, DevLake's DORA dashboard is ready to deliver your DORA metrics and benchmarks.
- Click the Dashboards on the top right corner. You can find the DORA dashboard within the Grafana instance shipped with DevLake, ready for you to put into action.
- You can customize the DORA dashboard by editing the underlying SQL query of each panel ^[6]^[7]^[8]^[9]^[10].

To continue talking to Dosu, mention @dosu.

klesh commented 4 months ago

It appears to me that it is the deployments created by the webhook were missing and the incidents from pagerduty were there just fine.

yoramshai commented 4 months ago

Thanks @klesh; I will check this. Is it supposed to affect the historical data (i.e. May) ?

klesh commented 4 months ago

it should, data is recalculated on each run

apache / incubator-devlake