Closed andriipatsula closed 11 months ago
I created synthetic test:
At first, the idea was that the reason we have different records is because of the way the customer re-ran the failed stage/job. I tried both options "Re-run failed jobs" & "Re-run all jobs", and was not able to simulate the problem.
And the only reasonable explanation is that when the build failed, the TelemetryService received a notification through the webhook that the build finished and started processing the results (calling Azure DevOps Timeline API). It depends on when the customer restarts the failing stage: immediately or after some amount of time; during the time the Telemetry Service is processing the build or after.
I encountered a situation where the Telemetry Service wrote 3 records in engineeringdata
.
Example of calling the Telemetry API over the inProgress
stage (after re-run failed job button is pressed)
"previousAttempts": [
{
**"attempt": 1,**
"timelineId": "previousAttempts_timelineid",
"recordId": "previousAttempts_recordid"
}
],
"id": "***********************************",
"parentId": null,
"type": "Stage",
"name": "Some stage name",
"startTime": null,
"finishTime": null,
"currentOperation": null,
"percentComplete": null,
**"state": "inProgress",**
"result": null,
"resultCode": null,
"changeId": 67,
"lastModified": "0001-01-01T00:00:00",
"workerName": null,
"order": 2,
"details": null,
"errorCount": 0,
"warningCount": 0,
"url": null,
"log": null,
"task": null,
**"attempt": 2,**
"identifier": "Some stage name"
}
There is no information available for previousAttempts_timelineid
and previousAttempts_recordid
IDs.
I double-checked that the data in cloudmine looks similar:
BuildTimelineRecord
| where OrganizationName in ('dnceng', 'dnceng-public') and BuildId in (2315074, 2315077)
| where Type == "Stage"
| where RecordName has "SignalR"
| summarize CountAttempt = count(), MinAttempt = min(Attempt), MaxAttempt = max(Attempt) by RecordName, BuildId
RecordName | BuildId | CountAttempt | MinAttempt | MaxAttempt |
---|---|---|---|---|
SignalR Publishing | 2315074 | 1 | 2 | 2 |
SignalR Publishing | 2315077 | 1 | 2 | 2 |
But only one record per build..
And I am able to find build with two (and more) records in the BuildTimelineRecord
table
RecordName | BuildId | CountAttempt | MinAttempt | MaxAttempt |
---|---|---|---|---|
SignalR Publishing | 2,286,846 | 2 | 1 | 2 |
So, the reason for that is that Timeline api returns the latest attempt only and you can request previous attempts by having additional API call with the timelineId
parameter specified (see doc for details). Cloudmine has only latest attempt and in most cases it's impossible to get data about previous attempts (see https://stackoverflow.microsoft.com/questions/372617 as example)
Example:
where for build
2315077
we have single records with theAttempt
column equal 2 and for2315074
- two records.Both builds 2315077 and 2315074 are identical: the same number of stages and the same failure.
To get all similar examples for the Release pipeline: