MongoDB large collection pagination is slow

nadavhalfon commented 4 months ago

Expected behavior

MongoDB collection pagination to be performant when index exists.

Actual behavior

Collection with a 700k docs, Takes around 10 seconds to fetch each page

Failure Logs

No logs but I have the reason, the reason is the aggregation sort stage of the query, the way it is used makes mongo to choose "COLLSCAN" strategy so although there is a limit of 15 it's still scans the whole collection.

This is the structure of the regular aggregation created by Forest agent:

    {
        "$addFields": {
            "customerDetails@@@customer": "$customerDetails.customer",
            .............
        }
    },
    {
        "$project": {
             ......
        }
    },
    {
        "$sort": {
            "_id": 1 // _id as an example for a default index
        }
    },
    {
        "$skip": 0
    },
    {
        "$limit": 15
    },
........more

As I understand after a little investigation the problem is that the $addFields makes mongo to do a COLLSCAN (it doesnt event matter that there is a limit of 15), If I try and move the $sort to be the first stage, it takes advantage of the index and scans only 15 docs

    {
        "$sort": {
            "_id": 1 // _id as an example for a default index
        }
    },
    {
        "$addFields": {
            "customerDetails@@@customer": "$customerDetails.customer",
            .............
        }
    },
    {
        "$project": {
             ......
        }
    },
    {
        "$skip": 0
    },
    {
        "$limit": 15
    },
........more

Context

Entering a collection and a default first query loads the data :)

Package Version:

"@forestadmin/agent": "^1.41.7",
"@forestadmin/datasource-mongo": "^1.3.12",
"@forestadmin/datasource-mongoose": "^1.7.12",
"dotenv": "^16.0.1",
"mongoose": "^6.10.3"

I do not have permissions for this repo, but I'm a new client 😁 I suggest this change to file "datasource-mongoose/src/collection.ts" line 299, function "buildBasePipeline". Making the sort first has a huge impact on the performance, at least when there is no relation filter

realSpok commented 3 months ago

Hello @nadavhalfon, thanks for your report.

Please, hang on while I'm looking into it, and see if I can reproduce the issue mentioned.

nadavhalfon commented 3 months ago

Sure thanks, I can help you reproduce if you need. I think it will be a big benefit for MongoDB users as filter on a relation fields (whether it's a sort or filter) is not performant in mongo anyway on big collection (because it will need to lookup every doc) Thanks!

realSpok commented 3 months ago

I think I'll need some more context indeed. Can you please let me know what your schema/object relations looks like ? I can see customerDetails that are linked to customers, but I'm not sure it is enough for me to reproduce. Can you please also let me know what is your datasource config ? And especially the flattening options ? Thanks !

nadavhalfon commented 3 months ago

I can share everything you need and event provide full queries/live demo, but I'd rather do it in front of the tech support and not to post here in github if that's ok, do you want me to reach out to a support mail?

realSpok commented 3 months ago

indeed, you can reach out to support@forestadmin.com You can also use the private messaging system to reach me on our community forum.

realSpok commented 3 months ago

Following-up our live discussion, I have finally managed to reproduce the issue and test your proposed resolution. I will open up a resolution ticket on my side and work on a fix. Thanks for the time taken to provide details 🙏 I'll keep this thread updated with the progress on the fix.

nadavhalfon commented 2 months ago

Hey, I saw the PR, is there any ETA? 😁 thanks!

realSpok commented 2 months ago

A fix has just been released. Please let us know how it works out for you 🙏

ForestAdmin / agent-nodejs