Azure / azure-cosmos-dotnet-v3

.NET SDK for Azure Cosmos DB for the core SQL API
MIT License
736 stars 491 forks source link

COUNT is including expired TTL documents that are pending deletion #4494

Closed justinbhopper closed 4 months ago

justinbhopper commented 4 months ago

Describe the bug This is not specific to the .NET SDK, so let me know if I should report this elsewhere.

When a document has a TTL (Time to Live) defined, Azure will use a background task to delete the document. Until the document is actually deleted, Cosmos will not return the expired data to ensure it appears delete as soon as it was supposed to be.

However, it seems COUNT does not reflect the appearance of deletion.

To Reproduce

  1. Create document ID "123" with ttl=N
  2. Wait N seconds
  3. Run SELECT * FROM root c WHERE c.id = '123' - Query returns no data (as expected)
  4. Run SELECT VALUE COUNT(c) FROM root c WHERE c.id = '123' - Query returns "1" (unexpectedly)

Expected behavior COUNT queries should also filter out any documents whose TTL has already been surpassed.

Actual behavior COUNT queries are including TTL documents that have already been surpassed and are pending deletion by the background task.

Additional context This can be tested purely in Azure Portal.

bartelink commented 4 months ago

The SDK can't do anything about this, so this is not the place the issue needs to live The other issue is that fundamentally this can't be fixed - count does not inspect documents, and documents that are present but expired are simpyl present. There's no secret index that anything could lean on to magically do this for free, and having an implicit behavior that makes count look into documents would be hugely inefficient (and a complete non starter as a behavior change).

justinbhopper commented 4 months ago

@bartelink Thanks for the feedback. I suspected as much about the SDK not being related to this. Is there a hub where issues specific to Cosmos should be posted?

There's no secret index that anything could lean on to magically do this for free

I'm surprised to hear there isn't an internal "expirationTs" data somewhere, given the background task already has to efficiently find documents that are pending delete. It must be implemented differently than I had imagined.

bartelink commented 4 months ago

I'm surprised to hear there isn't an internal "expirationTs" data somewhere, given the background task already has to efficiently find documents that are pending delete. It must be implemented differently than I had imagined.

I dont know how it's implemented, but I know the principles.

There is an index on the id, so you can count(1) efficiently. But to count other stuff efficiently, there needs to be an index to support it.

The point is that you would not lumber counting with looking inside the documents, as index lookups vs doc inspecions are very different things.

I can't recall whether _ts is implicitly indexed or not, but the way you find out is by looking at the query metrics

https://learn.microsoft.com/en-us/azure/cosmos-db/index-overview https://learn.microsoft.com/en-us/azure/cosmos-db/index-policy