apache / incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
https://devlake.apache.org/
Apache License 2.0
2.61k stars 530 forks source link

[Bug][Grafana] pr created_date is compared with incorrect date, sometimes with the future #8144

Closed vshulga-sift closed 1 month ago

vshulga-sift commented 1 month ago

Search before asking

What happened

We found the issue on Grafana DORA Details - Lead Time for Changes Dashboard. We see the list of PRs on PRs Detail list but do not see any Average metrics calculated. This happened only when we chose date range less or equal for Last 7 days.

After some investigation we found that Average PR Cycle Time has 2 conditions for created_date:

  1. $__timeFilter(pr.created_date) AND
  2. created_date >= DATE_ADD(DATE_ADD($timeFrom(), INTERVAL -DAY($timeFrom())+1 DAY), INTERVAL +1 MONTH)

The first condition works as expected, however, it seems that second condition is wrong. It gets the first day of current month and add to it interval in 1 month. As result we do comparison of pr created_date with the beginning of the next month if we choose date range less than 1 month. For 1month+ intervals we have some data as for case of the last month we compare created_date with the beginning of current month. But this calculation is also inaccurate.

Example of results for different date time range selection and query SELECT DATE_ADD(DATE_ADD($__timeFrom(), INTERVAL -DAY($__timeFrom())+1 DAY), INTERVAL +1 MONTH);

Query/Date range Result
SELECT NOW(); 2024-10-15 13:25:47
Last 5 minutes 2024-11-01 12:25:14
Last 15 minutes 2024-11-01 12:16:14
Last 30 minutes 2024-11-01 12:01:36
Last 1 hour 2024-11-01 11:32:15
Last 3 hour 2024-11-01 09:32:37
Last 6 hour 2024-11-01 06:33:02
Last 12 hour 2024-11-02 00:33:26
Last 24 hour 2024-11-01 12:33:46
Last 2 days 2024-11-01 12:34:09
Last 7 days 2024-11-01 12:34:33
Last 30 days 2024-10-01 13:35:08
Last 90 days 2024-08-01 13:35:32
Last 6 month 2024-05-01 13:35:55
Last 1 year 2023-11-01 12:36:20
Last 2 year 2022-11-01 12:36:42
Last 5 year 2019-11-01 12:37:12
Yesterday 2024-11-01 23:00:00
Day before yesterday 2024-11-01 23:00:00
This day last week 2024-11-01 23:00:00
Previous week 2024-11-01 23:00:00
Previous month 2024-09-02 00:00:00
Previous fiscal quarter 2024-07-02 00:00:00
Previous year 2023-01-02 00:00:00
Previous fiscal year 2023-01-02 00:00:00
Today 2024-11-01 23:00:00
Today so far 2024-11-01 23:00:00
This week 2024-11-01 23:00:00
This week so far 2024-11-01 23:00:00
This month 2024-10-02 00:00:00
This month so far 2024-10-02 00:00:00
This year 2024-01-02 00:00:00
This year so far 2024-01-02 00:00:00
This fiscal quarter 2024-10-02 00:00:00
This fiscal quarter so far 2024-10-02 00:00:00
This fiscal year 2024-01-02 00:00:00
This fiscal year so far 2024-01-02 00:00:00

Is second condition needed at all?

What do you expect to happen

created_date is filtered by correct data range selection in Grafana

How to reproduce

Anything else

No response

Version

release-v1.0

Are you willing to submit PR?

Code of Conduct

dosubot[bot] commented 1 month ago

Hey @vshulga-sift, I'm here to help you with any bugs, questions, or contributions! Let's work together to solve this issue.

The issue you're encountering is related to the second condition in the SQL query for filtering created_date in the Grafana dashboard. This condition is designed to ensure that only pull requests created after the start of the month following the __timeFrom() date are considered. However, as you've observed, this can lead to incorrect filtering, especially for date ranges less than one month, as it effectively compares created_date with a future date.

The second condition is:

created_date >= DATE_ADD(DATE_ADD($__timeFrom(), INTERVAL -DAY($__timeFrom())+1 DAY), INTERVAL +1 MONTH)

This logic is intended to filter out pull requests that were created before the start of the next month from the __timeFrom() date. However, it seems to be causing issues with shorter date ranges, as it excludes PRs that should be included based on the selected time range.

To address this, you might consider revising or removing this condition to ensure that the created_date is filtered correctly according to the selected date range in Grafana. This would help in accurately calculating the average metrics for the specified period.

To continue talking to Dosu, mention @dosu.