Closed wouldd closed 7 months ago
Hi @wouldd , I'm trying to understand what you have encountered. Correct me if I'm wrong:
commits_diff
table were incorrect. All commits rather than new commits were associated with the latest deployments.@Startrekzky pretty much. 1) yes usig azure devops go plugin which is what is pulling commits/pr information. and using teh 'webhook' to post deployment information that includes the shas of the commits being deployed. 2) yes, change lead time had been calculated for all commits in a repo and the time calculated for all of them was essentiaially from their moment of commit until the date of the recent deployment (which in reality only deployed 3 commits) 3) the commits_diffs table seems to just be a relationship table between commits, I'm not super clear on what it represents, but yes the 'new' commit that was included in the deployment had an entyr in here linking it back to the first commit ever in the repository 4) as mentioned if I modify the query in the lead time calculator to filter on 'new commit sha' then I appear to get sensible values restricted to those commits that were actually part of my deploy.. The odd thing is that if hte query is just wrong and it really should be filtering on new_commit_sha then I would expect this bug to impact all users of any datasource and dora is just broken. But since I would assume it would be pretty obvious if it wasn't working for everyone, I must conclude that the shape of this table is not being populated by the azure plugin as expected by the system at large.
Hi @wouldd This seems to be a problem related to the refdiff
. I'm not sure if the refdiff calling logics are the same among all plugins. I'll forward it to @klesh to take a look.
I suspect that the merge_commit_sha
might be wrongly picked.
https://learn.microsoft.com/en-us/rest/api/azure/devops/git/pull-requests/get-pull-requests?view=azure-devops-rest-7.1&tabs=HTTP
Currently, both azuredevops plugins are using lastMergeCommit
as the merge commit sha
, but maybe it should be lastMergeTargetCommit
?
For Change Lead Time to be calculated correctly, the following requirements must be met:
pr.merge_commit_sha
must be found in the Linear Historyloop in @mr-ks
@klesh that certainly sounds plausible. I'm trying to wrap my head around your point 1 about all deployment commits forming a linear history 'newer deployment commit should contain all older deployment commits' I'm not sure I understand why that would be desired? I would assume that for change cycle time you need 1) a deployment id with it's date of deployment, result, environment. 2) a set of commit shas that are linked to that deployment id as the vehichle that deployed them, 3) a lookup of all commits with their commit date and a link to the PR that merged them.
So you would process a deployment row, link it's deployment commits and join to their commit records and pr record. then calculate the various time deltas for commit->pr create->pr-merge->deployment to production and record that cycle tyime against those commits. So any commit that has no associated deployment would have no cycle time values (the vast majority of our commit history was deployed long ago and we won't be back porting that information just starting from now) dora calcs /boards would work on only those commits that have non-null values for cycle time I'm not sure I see in this why a deployment commit would be linked to older deployments at all?
@wouldd It is needed to determine which deployment a PR belongs to.
If I understand your assumption correctly, you are connecting the PR to deployment by commit dates? no, it won't work, because the deployment happened after a commit got committed not necessarily including the commit. For example, one may start committing to a feature branch for the next version while a deployment on the release branch for the current version is taking place.
@klesh I don't think we need to relate things by date at all. I think you have a deployment record provided by the webhook where that record says 'these are teh commits that were deployed in this deployment' I think you have a PR record provided by the azure devops api that says 'these are the commits that were part of this PR' so from ad eployment you can select the commits it deployed and join them to the pr commits to find the pr(s) that they were merged in.
So between these you can link all the data required to caluclate cycle time. you know exactly which commits where in a deployment and can look up which PR those commits were associated with to reference those timings and look up what date that commit was originally made to calculate how long it took from that point to the point it was deployed.
Why would a deployment commit be linked to all previous deployment commits?
@klesh I don't think we need to relate things by date at all. I think you have a deployment record provided by the webhook where that record says 'these are teh commits that were deployed in this deployment' I think you have a PR
No, this is not the case. The webhook, or any other data sources would have only 1 commit sha for each repo per deployment stored in the cicd_deployment_commits
table, so we have to calculate the commits between deployments, which is covered by https://github.com/apache/incubator-devlake/blob/49f276f558a2ea7124dfdc680bccbef27bf0b581/backend/plugins/refdiff/tasks/deployment_commit_diff_calculator.go
Let's assume that you have a release
branch, you merge PR to it then deploy, at any point of time, you pick the latest commit of the branch, and search all its ancestors, you can find every single deployment commits up to the point, then you are guaranteed to have a correct PRs and deployments relationship. This is important, because one may merge a couple of PRs before a deployment.
Sorry, I shouldn't use the term Linear Commit History
, it is not accurate, we just need all the deployment commits gets included in the history.
record provided by the azure devops api that says 'these are the commits that were part of this PR' so from ad eployment you can select the commits it deployed and join them to the pr commits to find the pr(s) that they were merged in.
That is the idea but not the whole story, in fact, commits
of a PR morally don't end up in the deployment commit history., especially, when most projects tend to use Squash and Merge
which no PR.commits
would ever end up in the release
branch. To address this, we use merge_commit_sha
( or squash_commit_sha
or head_commit_sha
) which would definitely appear in the commits history to represent if a PR is included in a deployment.
So between these you can link all the data required to caluclate cycle time. you know exactly which commits where in a deployment and can look up which PR those commits were associated with to reference those timings and look up what date that commit was originally made to calculate how long it took from that point to the point it was deployed.
Why would a deployment commit be linked to all previous deployment commits?
Let me know if I missed anything.
@klesh Sorry for the slow response, I was on vacation for the last two weeks then getting ontop of other work stuff. I saw that you had put in a fix related to this issue and I tried applying that fix. however it does not help. the dora reports still calculates every commit ever agfainst my respository against the only deployment it has been told about. I suspect there is a fundamental presumption in this that this system will always have full information about every deployment from the start of the repository? however that is not going to be the case for us (or I assume most users). so we might have a 3 year old rep owith thousands of commits against it that were deployed historicaly hoever we have no way and no intention to try to tell devlake about those historical deployments. We just want information as it pertains to new deployments and the commits there. I saw you added a build url to the cicdbuild converter in your fix, which I assume was hoped would limit the scope of the queries to just things that had a populated build url? The other issue here is that our deployments are not happening from build pipelines. they're handled by a completely separate tool.(hence the use of the webhook)
unfortunately as it stands this makes the dora metrics unusable for us, and those would be the main thing we're interested in.
I might be encountering the same bug. If not, please feel free to delete this comment as I don't want to stray the discussion.
I created a project with a single Github repo, using the GitHub connection for commits and PRs. A webhook is used for deployments.
The DORA Details - Lead Time for Changes
board shows the correct PR averages but the 2. PR details
table shows a ridiculously high change_lead_time
(see screenshots).
It appears to be associating all the PRs in the project to the same and currently only webhook commit (notice how the deployment_id
is the same for all PRs).
List of PRs in board:
Correct averages:
This is also visible in the project_pr_metric
table, where all PRs match the same deployment_commit_id (in this case the webhook id).
I'm on 1.0.0b4.
Here's the sanitized CURL for the webhook:
"deploymentCommits":[
{
"commit_sha":"<sha>",
"repo_url":"https://github.com/<org>/<repo>"
}
],
"create_time":"2024-04-11T11:39:33+00:00",
"start_time":"2024-04-11T11:39:33+00:00",
"end_time":"2024-04-16T11:51:35+00:00",
"result": "SUCCESS"
}'
Thanks!
@Startrekzky I'm curious about the characterisation of the bug priority you set being 'does not affect the functionality or isn't evident' from my point of view this renders one of the core dora metrics unusable. I would consider that to be a very evident impact on function and a key metric we would be interested in.
Thank you for pointing this out, @wouldd . I mis-operated it. It's updated now.
I might be encountering the same bug. If not, please feel free to delete this comment as I don't want to stray the discussion.
I created a project with a single Github repo, using the GitHub connection for commits and PRs. A webhook is used for deployments. The
DORA Details - Lead Time for Changes
board shows the correct PR averages but the2. PR details
table shows a ridiculously highchange_lead_time
(see screenshots). It appears to be associating all the PRs in the project to the same and currently only webhook commit (notice how thedeployment_id
is the same for all PRs).List of PRs in board:
Correct averages:
This is also visible in the
project_pr_metric
table, where all PRs match the same deployment_commit_id (in this case the webhook id).I'm on 1.0.0b4.
Here's the sanitized CURL for the webhook:
"deploymentCommits":[ { "commit_sha":"<sha>", "repo_url":"https://github.com/<org>/<repo>" } ], "create_time":"2024-04-11T11:39:33+00:00", "start_time":"2024-04-11T11:39:33+00:00", "end_time":"2024-04-16T11:51:35+00:00", "result": "SUCCESS" }'
Thanks!
Was the deployment the earliest one? If so, it is expected.
I think the core of the problem is that the time ranges of deployments and PRs are not aligned. Maybe we can solve the problem by either of the following methods:
@wouldd @nicolavolpini
@klesh the first option here is a complete non-starter. we have >1000 repos which in some cases have existed for over 10 years. we have no way to recreate the deployment history and no reason to do so. we're not intereested in looking back that far at metrics. I think this would be common to most real-world users, they won't be starting out with devlake in place, they will be adding it to probably quite mature teams with long histories which are incomplete. the system needs to be able to handle picking up from a postiion of only being told about new deployments even though there is a long commit history already.
I would say that the dora metrics needs to be restrcted to starting at the earliest commit that any deployment webhook call has explicitly linked to a deployment. This is why I though the sql in the linking logic query was simply wrong in my initial bug description.
I would assume @nicolavolpini is in a similar boat since those numbers showed 6 month lead times.
I think this would be common to most real-world users, they won't be starting out with devlake in place, they will be adding it to probably quite mature teams with long histories which are incomplete. the system needs to be able to handle picking up from a postiion of only being told about new deployments even though there is a long commit history already.
This is precisely the situation we are in.
In our case, we had to rely on Webhooks for deployments since our pipeline structure - although it uses Jenkins - makes it impossible to correlate Jenkins deployment SHAs to GitHub PR SHAs (we have an intermediate step where Jenkins is picking artifacts from a central repo, thus a different SHA is shown in the job).
On a side note: what I expected was for a deployment webhook to be bound only to the PR it relates to. This is at least what I assumed as a user approaching DevLake and DORA metrics for the first time. I think it's a natural assumption.
Hi @nicolavolpini and @wouldd , @klesh and I have reached an agreement on the solution to this issue, which is not linking the old PRs to the first deployment pushed from webhook
.
This solution will fix the issue you have faced. The con is that the very first deployment will not be linked to any PRs. So, there might be a very small number of PR's cycle time that can not be matched, but I think it's acceptable.
Does it make sense?
Hello @Startrekzky , thanks that would be a good compromise for me. Better than the wrong data for sure. I do not understand why the first PR would be missed but I trust your judgment on this :)
@Startrekzky @nicolavolpini @klesh I assume this is just the easiest thing to do to scope the date range to be the first deployment notification for a repo thorough to 'now' but the first deployment notification would cut off all the actual commits/prs for that deployment which happen before it.
I agree with Nicola that it's certainly better than the current situation and would soon become irrelevant in the stats. so if that's the easiest route then i'm all for it. though I do wonder if you could just look up the date of the commits linked to the first deployment and use those to capture the first one?
though I do wonder if you could just look up the date of the commits linked to the first deployment and use those to capture the first one?
Hi @wouldd , do you mean that even though we can't link all PRs merged (e.g. PR-1, PR-2, PR-3. PR-3 is the one that linked to the merge_commit that triggers deploy-1) between the first-pushed deployment (deploy-1) to DevLake
and the earlier deployment that hasn't been pushed to DevLake
, at least we can link PR-3 to deploy-1?
right I guess I meant that where you said 'The con is that the very first deployment will not be linked to any PRs.' I thought that might be avoidable - in my case the deployment webhook is still being told which commits are associated and those commits can be associated to the pr they went through etc.
It's still not entirely clear to me what I am losing/missing by just updating that dal filter in the change_leadtime_calculator in the getDeploymentscommit method to be: dal.Where("pm.project_name = ? AND cd.new_commit_sha = ? AND dc.RESULT = ?", projectName, mergeSha, devops.RESULT_SUCCESS),
instead of cd.commit_sha.
it certainly appears to me that the stats start to make sense when I run that way
I thought that might be avoidable - in my case the deployment webhook is still being told which commits are associated and those commits can be associated to the pr they went through etc.
@wouldd I don't think it's entirely avoidable, but I agree that your proposal could associate the first deployment (pushed to DevLake) with ONE of the PRs merged between the first deployment and its prior deployment.
We link deployments to PR via the medium 'newly-added-commits-by-d1', but we don't know all of these commits, we only know the merge commit sha and what you were trying to propose was to find the PR of the merge commit sha.
If your use case is to trigger a deployment every time a PR gets merged, it will work, because every deployment only gets one PR deployed. However, not all users use this practice. For those who do not, d1 will only be linked to part of the PRs, and this is the cons I said.
I hope it makes sense.
Already effective in v1.0.0-beta6. @nicolavolpini @wouldd
Thanks! This looks much better.
One thing: while the metrics look correct, the DORA Details - Lead Time for Changes
board still shows several PRs associated to the same deployment webhook:
How come? Thanks
Hi @nicolavolpini , is this the first deployment webhook you posted via webhook? Did this deployment deploy all the commits merged by those associated PRs? If so, the logic might be correct.
Also, please mention me when commenting, otherwise I won't be notified.
Search before asking
What happened
I assume that the issue I'm seeing is somehow unique to me using the azuredevops_go plugin and the webhook for reporting deployments from octopus. However the 'bug' I've identified looks to me like it would effect everyone. So probably the real issue is somewhere else but all I can say is that for me the DORA metrics are all wildly out because it relates every commit that has ever been made against a repo to the most recent deploy. This means the cycle time calculation for each commit is showing as just the age of that commit until the most recent deploy. I believe the offending line is this one: https://github.com/apache/incubator-devlake/blob/cb69f35b5319f57dd3d6c8534121a6ca069cd175/backend/plugins/dora/tasks/change_lead_time_calculator.go#L215C2-L215C3 the commits_diffs table has 3 columns, commit_sha, old_commit_sha and new_commit_sha. For me I need the filter in this query to filter where the mergeSha = cd.new_commit_sha in order to work correctly. I've tested this in my setup and that change has the expected behaviour. However, I would obviously expect this to be broken for everyone doing DORA metrics at all, and it seems unlikely no one else noticed this. So I guess maybe the intention of that commits_diffs table is to have information in it differently and perhaps the azure plugin is populating it incorrectly?
What do you expect to happen
In my case I have lots of quite old 4+years minimum repos. but I've only just started feeding deploy events in. so I would expect only to have data for the relatively few new commits since this went into place. and I would expect the cycle time calculations for those to be right.
How to reproduce
configure a long standing repo with lots of historical commtis against it. create a webhook call for the most recent commit/commtis in teh repo and expect to see dora calculations for those commits only. However you should see that it links all of the old commits to the new deployment. Indeed when I run with a version in which I changed the above filter check then I get the behavior I expect. all the numbers make sense and the pr_cycle_time calculations show up numbers for only commits that have been reported as part of deployments
Anything else
Happens every time for all commits.
Version
master
Are you willing to submit PR?
Code of Conduct