Pagination with MAX_ITEM_COUNT -1

Mortana89 commented 4 years ago

Currently Cosmonaut does not allow fetching data with pagination without defining an initial page size (as first request). This is against a recommendation we received from Cosmos DB: You are using the query page size of 100 for queries for your Azure Cosmos DB container. We recommend using a page size of -1 for faster scans

Upon following the following article: https://docs.microsoft.com/nl-nl/azure/cosmos-db/sql-api-query-metrics#max-item-count

It's clear that it's better to specify -1 as our max_item_count for the first fetch, so Cosmos DB can determine what's the max response set. Further down the road we can then use continuation token and follow normal path.

However, the SDK currently does not support specifying -1 due to the following lines: https://github.com/Elfocrash/Cosmonaut/blob/2619335d2a94271158ee52dcaef55bce002978b6/src/Cosmonaut/Extensions/PaginationExtensions.cs#L26

https://github.com/Elfocrash/Cosmonaut/blob/2619335d2a94271158ee52dcaef55bce002978b6/src/Cosmonaut/Extensions/PaginationExtensions.cs#L45

Can these be changed to also support -1 as page size, as such cosmos DB can determine the max size?

Elfocrash commented 4 years ago

This best practice must be new. It definately wasn't the recommendation before. I've personally seen scenarios where setting this to -1 can cause some serious performance issues.. I am willing to change that but if you change the pageSize to -1 but I will also need to benchmark the behavior. Can you give me a hand by branching Cosmonaut and testing it with the -1 pageSize?

Mortana89 commented 4 years ago

I've done some tests. It's difficult to say and surely depends upon scenario:

MaxItemCount	100		Items

Call	Time (ms)	Items	RU
1	331	100	14,37
2	260	100	14,78
3	529	100	12,61
4	270	100	13,85
5	211	100	12,08
6	182	100	12,13
7	182	100	12,05
8	250	100	11,93
9	197	100	11,76
10	459	100	12,29
11	259	100	12,47
12	192	100	12,46
13	314	100	12,18
14	207	100	12,52
15	257	100	12,48
16	257	100	12,26
17	324	65	8,94
	4681	1665	211,16

		Spread out RU (Total RU / time in seconds)	45,11002


MaxItemCount	-1
1	694	1665	167,48

The advantage of using fixed bucket size is of course the spread in RU consumption. When using the maxitemcount of -1, you'll get bigger RU spikes which could also cause performance issues if the RU spike is higher than the available RU on the collection, which causes throttling etc etc...

An advantage of using the second approach, is the network call time adds up when a user would request many records in a listpage. But in our case it would take more than 300 items before it gets interesting to collect more data at once.

PS; those are local development timings ;)

Elfocrash commented 4 years ago

Thanks for this @Mortana89.

When using the maxitemcount of -1, you'll get bigger RU spikes which could also cause performance issues if the RU spike is lower than the available RU on the collection, which causes throttling etc etc

This is the reason why I disabled that in the first place. I could remove the -1 limitation to allow for this usecase but I am not sure if that's the right approach to this. I will re-wire the page extensions to use the added Skip and Take support in the SDK so I think that this whole issue will probably go away.

I will put this on hold until I settle on the approach.

Elfocrash / Cosmonaut

Pagination with MAX_ITEM_COUNT -1 #115