NuGet / Insights

Gather insights about public NuGet.org package data
Apache License 2.0
24 stars 7 forks source link

SAS token may expire before the Azure Blob operation finishes #72

Closed rmt2021 closed 2 years ago

rmt2021 commented 2 years ago

Description

For Blob service, GetUserDelegationKeyAsync is used in GetServiceClientsAsync to get the delegation key and then use the key to sign the SAS token in GetBlobReadUrlAsync via BlobSasBuilder.

But I notice that the token expiration time is 1 hour, and it will be refreshed at the half time (i.e., 30 mins), so I wonder whether it is possible to have a single operation that cannot be finished in 30 mins. I think IKustoQueuedIngestClient.IngestFromStorageAsync uses the Blob SAS token to access data in the remote. If there is a large file to ingest, which may need more than 30 mins, it is possible to encounter an unexpected HTTP 403 Forbidden error, right?

joelverhagen commented 2 years ago

That's a great point. I think the Kusto ingestion is the key place where this problem would surface. I am not an expert in how the Kusto ingestion backend works, but I imagine it is totally possible to provide Kusto ingestion a SAS token with, say, 30 minutes left and it, perhaps due to a small Kusto scale or an overwhelming number of ingestions, Kusto actually doesn't download the blob (i.e. download the CSV via the SAS enabled URL) within 30 minutes.

Most other operations in Insights certainly complete in less than 30 minutes since the units of work are Azure Function queue messages which, in many runtimes, must not exceed 10 minutes in execution.

It's this other case where an external system (Kusto) may hold on to a SAS-enabled URL for too long.

Have you encountered this problem in practice? Just to be careful, I think it would be reasonable to set an expiration time that far exceeds the expecting ingestion time for Kusto.

This is essentially a contract between Insights and Kusto. Kusto must complete its job before the SAS token expires.

joelverhagen commented 2 years ago

I increased the SAS duration to 12 hours. This should make this situation less likely. The Kusto ingestion flow has up to 5 attempts so for this to be a problem the ingestion of the tables would need to take more than 12 hours 5 times in a row. Retries only occur on the failed tables, not all, so parallel ingestion volume should decrease per attempt.