Closed Musikolo closed 6 years ago
Note that documentation on Confluence is no longer maintained. Please refer to restheart.org/learn
Cursor pools don't return documents, they serve a different purpose:
RESTHeart speedups the execution of GET requests to collections resources via its db cursors pre-allocation engine. This applies when several documents need to be read from a big collection and moderates the effects of the MongoDB cursor.skip() method that slows downs linearly. In common scenarios, RESTHeart’s db cursor pre-allocation engine allows to deliver brilliant performances, even up to a 1000% increase over querying MongoDB directly with its Java driver.
What you want is pagination:
Embedded documents are always paginated, i.e. only a subset of the collection’s document is returned on each request. The number of documents to return is controlled via the pagesize query parameter. Its default value is 100, maximum allowable size is 1000. The pages to return is specified with the page query parameter. The pagination links (first, last, next, previous) are only returned on hal full mode (hal=f query parameter); see HAL mode for more information.
You should have a look at:
RESTHeart actually uses a mongodb cursor batch size of 1000, which is the maximum pagesize value.
See BATCH_SIZE in https://github.com/SoftInstigate/restheart/blob/master/src/main/java/org/restheart/db/CollectionDAO.java
This is needed to avoid poor performance with requests with high pagesizes, see https://github.com/SoftInstigate/restheart/issues/218
I guess that this value might be problematic with your huge collection.
What if we allow defining a maximum pagesize via configuration file, and also use this value as cursor batch size?
Hi guys,
First of all, thanks for your ultra-quick response. I really appreciate it.
Yes, I think that what I saw in the logs was a batchSize of 1000. So, it would definitely help to have this property configurable. At the very least, it will be useful to fine tune the performance of each application to their specific needs. It could keep the same value by default.
Assuming this change is simple to do, what is your best guess for a release with it? My current MongoDB version is 3.4.14. I hope there are no incompatibilities.
Thank you!
I was thinking about this matter, I want to point out something that might be interesting exploring.
Would be possible adding support for the limit(x) operation. I'm aware of the performance issues when paging with skip() and limit() operations, but for the very first pages of a cursor, it's an option that could be beneficial overall. Thoughts?
The pagesize query parameter does exactly this, i.e. ?pagesize=100 results in .limit(100).
?page controls the skips.
Well, it's not exactly the same, because RESTHeart is using some built-in logic in combination with the default batch size (=1000) to do the paging. It's not using MongoDB native .skip(x).limit(y)
operations. Anyway, it was just a thought I wanted to share.
I've cloned the code, changed the BATCH_SIZE
constant to 100
, and run a test with JMeter in my test environment. This is what I get after 30 minutes running:
With original BATCH_SIZE=1000
:
summary = 9804 in 00:30:52 = 5.3/s Avg
With custom BATCH_SIZE=100
:
summary = 30561 in 00:30:25 = 16.7/s Avg
This is more than 3 times faster for the test case I tested. So, I definitely think that having this property configurable it's really valuable.
Thank you so much for the great support!
@Musikolo we agree, we'll put this into the next minor release, in the following days.
I just added 3 new options to the configuration file (default-pagesize,
max-pagesizeand
cursor-batch-size`) that tune the overall read performance according to expected pagesize.
The default values are 100, 1000 and 100 respectively. In your case where you expect 98% of request with pagesize=25
you could set:
default-pagesize: 25
max-pagesize: 100
cursor-batch-size: 25
see commit 20016ae83ca9749718ca7f63ecf24f299b46d56c
The default configuration "Read Performance" section follows:
## Read Performance
default-pagesize: 100
# default-pagesize is the number of documents returned when the pagesize query
# parameter is not specified
# see https://restheart.org/learn/query-documents/#paging
max-pagesize: 1000
# max-pagesize sets the maximum allowed value of the pagesize query parameter
# generally, the greater the pagesize, the more json serializan overhead occurs
# the rule of thumb is not exeeding 1000
cursor-batch-size: 1000
# cursor-batch-size sets the mongodb cursor batchSize
# see https://docs.mongodb.com/manual/reference/method/cursor.batchSize/
# cursor-batch-size should be smaller or equal to the max-pagesize
# the rule of thumb is setting cursor-batch-size equal to max-pagesize
# a small cursor-batch-size (e.g. 101, the default mongodb batchSize)
# speeds up requests with small pagesize
Thank you so much for the prompt commit!
I understood what you meant your previous post, but I guess you meant I should use:
default-pagesize: 25
max-pagesize: 100
cursor-batch-size: 100
I'm looking forward to getting the next release available to test it out. Obviously, I prefer a clean and configurable solution like the one you just implemented, instead of something hard-code as I did.
Further testing difference in performance with a lower cursor batch size, this morning I could test in our production environment our new JAR I built, and the performance boost is event larger than in our test environment. The average throughput has gone from 30 req/s to ~ 230 req/s. This is more than 7.5 times faster!! To be honest with the performance outcome, we also upgraded the hardware to have a cluster with more RAM and CPU.
Thank you so much again for your great support!
@Musikolo In case you'll have time to write something, we are always looking for blog posts about RESTHeart, how people use it and why.
Hi,
I have a question I would like to get some guidance for. I'm not sure if this is the best place to ask questions, but I haven't found any better place. I'll be happy to ask again wherever I'm instructed, if this is not the place for it. I've read the documentation carefully, but I didn't find the information I'm looking for.
Question: I've got a really big collection with 1.3 billion documents. There is a field that returns the documents that belong to each user. The number of documents per user could range from very few thousands to up to 80K. +98% of calls just want to show first 25 documents. There is a "More documents" button to get the next page.
I've noticed that RESTHeart preloads 1000 all the time, but this happens to be very expensive. I thought that was related to the default use of linear cursors. I've tried changing the value from 1000 to 100, but it didn't work for me.
So, how can I do for RESTHeart to preload a smaller number of documents, say 100, since in my application most users will be happy with the first 25 documents (page=1)?
Thank you!