Elfocrash / Cosmonaut

🌐 A supercharged Azure CosmosDB .NET SDK with ORM support
https://cosmonaut.readthedocs.io
MIT License
342 stars 44 forks source link

Pagination with MAX_ITEM_COUNT -1 #115

Open Mortana89 opened 4 years ago

Mortana89 commented 4 years ago

Currently Cosmonaut does not allow fetching data with pagination without defining an initial page size (as first request). This is against a recommendation we received from Cosmos DB: You are using the query page size of 100 for queries for your Azure Cosmos DB container. We recommend using a page size of -1 for faster scans

Upon following the following article: https://docs.microsoft.com/nl-nl/azure/cosmos-db/sql-api-query-metrics#max-item-count

It's clear that it's better to specify -1 as our max_item_count for the first fetch, so Cosmos DB can determine what's the max response set. Further down the road we can then use continuation token and follow normal path.

However, the SDK currently does not support specifying -1 due to the following lines: https://github.com/Elfocrash/Cosmonaut/blob/2619335d2a94271158ee52dcaef55bce002978b6/src/Cosmonaut/Extensions/PaginationExtensions.cs#L26

https://github.com/Elfocrash/Cosmonaut/blob/2619335d2a94271158ee52dcaef55bce002978b6/src/Cosmonaut/Extensions/PaginationExtensions.cs#L45

Can these be changed to also support -1 as page size, as such cosmos DB can determine the max size?

Elfocrash commented 4 years ago

This best practice must be new. It definately wasn't the recommendation before. I've personally seen scenarios where setting this to -1 can cause some serious performance issues.. I am willing to change that but if you change the pageSize to -1 but I will also need to benchmark the behavior. Can you give me a hand by branching Cosmonaut and testing it with the -1 pageSize?

Mortana89 commented 4 years ago

I've done some tests. It's difficult to say and surely depends upon scenario:

MaxItemCount 100   Items 1665
         
Call Time (ms) Items RU  
1 331 100 14,37  
2 260 100 14,78  
3 529 100 12,61  
4 270 100 13,85  
5 211 100 12,08  
6 182 100 12,13  
7 182 100 12,05  
8 250 100 11,93  
9 197 100 11,76  
10 459 100 12,29  
11 259 100 12,47  
12 192 100 12,46  
13 314 100 12,18  
14 207 100 12,52  
15 257 100 12,48  
16 257 100 12,26  
17 324 65 8,94  
  4681 1665 211,16  
         
    Spread out RU (Total RU / time in seconds) 45,11002  
         
         
MaxItemCount -1      
1 694 1665 167,48  

The advantage of using fixed bucket size is of course the spread in RU consumption. When using the maxitemcount of -1, you'll get bigger RU spikes which could also cause performance issues if the RU spike is higher than the available RU on the collection, which causes throttling etc etc...

An advantage of using the second approach, is the network call time adds up when a user would request many records in a listpage. But in our case it would take more than 300 items before it gets interesting to collect more data at once.

PS; those are local development timings ;)

Elfocrash commented 4 years ago

Thanks for this @Mortana89.

When using the maxitemcount of -1, you'll get bigger RU spikes which could also cause performance issues if the RU spike is lower than the available RU on the collection, which causes throttling etc etc

This is the reason why I disabled that in the first place. I could remove the -1 limitation to allow for this usecase but I am not sure if that's the right approach to this. I will re-wire the page extensions to use the added Skip and Take support in the SDK so I think that this whole issue will probably go away.

I will put this on hold until I settle on the approach.