NuGet / Insights

Gather insights about public NuGet.org package data
Apache License 2.0
24 stars 7 forks source link

Improve recovery from Kusto query validation steps #80

Closed joelverhagen closed 1 year ago

joelverhagen commented 1 year ago

From time to time the ingestion pipeline gets blocked because a Kusto validation query fails. Example:

A Kusto validation query failed.
Validation label: full outer set comparison of NiCatalogLeafItems_Temp.Identity and NiPackageSignatures_Temp.Identity
Error: The set of values in the Identity columns in the NiCatalogLeafItems_Temp and NiPackageSignatures_Temp tables do not match.
Identity values in NiCatalogLeafItems_Temp but not NiPackageSignatures_Temp:
- Count: 1
- Sample: ["drewsubmissiontest/1.0.0"]
Identity values in NiPackageSignatures_Temp but not NiCatalogLeafItems_Temp:
- Count: 0
- Sample: []

NiCatalogLeafItems_Temp
| distinct Identity
| join kind=fullouter (
NiPackageSignatures_Temp
| distinct Identity
) on Identity
| where isempty(Identity) or isempty(Identity1)
| summarize
LeftOnlyCount = countif(isnotempty(Identity)),
LeftOnlySample = make_set_if(Identity, isnotempty(Identity), 5),
RightOnlyCount = countif(isnotempty(Identity1)),
RightOnlySample = make_set_if(Identity1, isnotempty(Identity1), 5)

I think there's some race condition related that causes this to happen sometimes.

We should have an easy way to abort the current Kusto ingestion and re-run the whole workflow from the beginning.

joelverhagen commented 1 year ago

Completed with https://github.com/NuGet/Insights/commit/b29cdfef1894e8644e0924720e245b722b5025a1.