SoftInstigate / restheart

Rapid API Development with MongoDB
https://restheart.org
GNU Affero General Public License v3.0
805 stars 171 forks source link

Get _size of the collection #518

Closed smpadhy closed 2 months ago

smpadhy commented 2 months ago

Describe the bug

We have a large collection with approximately 2.5 billion documents. I am using _size for the number of documents in a collection. It is giving

22:08:22.702 [XNIO-1 task-9] ERROR org.restheart.handlers.ErrorHandler - Error handling the request
 java.lang.ArithmeticException: integer overflow
    at java.base/java.lang.Math.toIntExact(Math.java:1071)

We found out that Java Max Int is 2147483647 which is less than 2.5 billion. That might be the reason of integer overflow when we tried _size.

For smaller collection, _size is returning fine.

RestHeart version 4.1.6.

Is there a workaround to get size of the collection?

mkjsix commented 2 months ago

RESTHeart version 4 is 5 years old and not supported anymore. At present we support only the last two major versions, that today are v7 and v8.

So, my suggestion is to upgrade to the most recent version of RESTHeart (at present is 8.0.6) and test your solution with it. In case the bug persists we'll be happy to fix it.

We know that RESTHeart v8 is quite different from v4, so If you must temporarily stick with v4 then you could try to define a query filter that partitions your data somehow, so that you can get the size by applying that filter and then sum the partial results. How to create such a filter depends on your data model.

Anyway, I would suggest to plan an upgrade to a more recent version of RESTHeart as soon as possibile: besides hundreds of bug fixes and new functionalities, there's a ton of performances increases and security fixes applied in the last five years, especially if you are dealing with very large MongoDB collections.

ujibang commented 2 months ago

I confirm that recent versions of RESTHeart handle the collection size with long type that is more that enough for 2.5 billions documents.

smpadhy commented 2 months ago

@ujibang Great! Thanks for confirming that. Could you please let me know the version number after which the collection size has been made long type? @mkjsix Thanks for suggesting the temporary fix. We do have some plan to upgrade given we are also experiencing some performance issues in different deployment environments. However, it will be a big effort given the five old version of RestHeart. What is the closest version where there are minimal changes to the API compared to version 4?

mkjsix commented 2 months ago

@smpadhy RESTHeart has evolved a lot in the last major releases, starting from v5 RESTHeart is on a very different architecture and internal code organization than of v4.

For this reason, I don't recommend to upgrade to an old version that is not anyway supported, but instead jump to the latest version 8 that benefits from active bug, performance and security fixes. Let's say you move to v5 o v6 and then you have bugs: we don't fix them, as those releases are end of life and then your effort would be more or less the same of upgrading to v8 anyway.

You should be able to upgrade by yourself and you can always ask questions here, but if you need to speed up your work there are professional services available for purchase to help you with planning and executing the upgrade and fine-tune your performances.

smpadhy commented 2 months ago

Thank you for the recommendation.