Srylax / mongodb-cursor-pagination

Provides cursor based pagination for the native MongoDB driver in Rust.
https://docs.rs/mongodb-cursor-pagination/
MIT License
2 stars 4 forks source link

Aggregation Pipelines #17

Open Srylax opened 1 month ago

Srylax commented 1 month ago

@douggynix wrote:

I handle the aggregation query pagination a custom house solution. The pagination for aggregation pipeline is very tricky as you have multiple filter stages that may render any kind of data, going from count to projection, or using geoindex.

After #16 is completed with the new API, aggregation pipelines should be included.

Srylax commented 1 month ago

@douggynix would you be able to share your implementation of the aggregation pipelines pagination?

douggynix commented 1 month ago

@douggynix would you be able to share your implementation of the aggregation pipelines pagination?

My house solution is not that in a loose coupling state. But I am going to share the whole idea. That will require to modify the user query to add some additional stages to help with pagination. The challenge with aggregation is that the user is free to add limit filter wherever in the pipeline stage for his/her own purpose. And pagination relies heavily on resultsets limit for boundaries upfront. If the user purposely adds limit in the middle of its pipeline queries and and you use that query to do your counting. Your pagination assumption will be flawed and wrong upfront. And you must not remove any such assumptions from the users filter.

Here is how I address the pagination on my scenario. Consider this query below. where i have a Userprofile that stores geographic information , and I have a location field which has a geospacial index on it. I want to list the users close to a maximum distance from geographic coordinates (longitude, latitude). As I have many results coming in, I use "$setWindowFields" stage at the end of the pipeline process (here before limit) in order to purposely introduce a "rowNumber" field that will hold each document index in the result. In my case , i decided to sort by "distance" field which is the sort axis my geospacial query. If here, i do ever sort by "_ID" , it will break the geospacial result returned in the first stage. My take in my house solution is I always insert my "$setWindowFields" before the last stage of the pipeline as seen below. If you have to count the results, it would take you to get rid of any limit placed at the end of the pipeline stage. And yet , there can still be issue if the user puts limits anywhere in the middle for its own business case. Try this aggregation query by removing stage1 and using a match stage on your own collection to see the result. "$setWindowFields" must use a sorting field that returned unique values. same as the cursor principle. in my case, the ID field sort will invalidate the geo spacial index ordering. I may be wrong. but that's how i managed to get an in house pagination working.

db.user_profiles.aggregate(
            // Pipeline
            [
                // Stage 1
                {
                    $geoNear: {
                        distanceField : "distance",
                        near : {
                            type : "Point",
                            coordinates : [
                                -73.62469511289265,
                                45.55061990811232
                            ]
                        },
                        maxDistance : 10000, // optional
                    }
                }, 
        {
                $match : { visible: { $eq: true } }
                }, 
              {
         $setWindowFields: {   
             sortBy: { distance: 1 },
                      output: {
                   rowNumber: {  $documentNumber: {}  }
            }  
           }
         },
                // Stage 3
                {
                    $limit: 100
                }
            ],

            // Options
            {

            }
        )