SoftInstigate / restheart

Rapid API Development with MongoDB
https://restheart.org
GNU Affero General Public License v3.0
805 stars 171 forks source link

Aggregation $in operation (large number of documents) #103

Closed vinodprathipati closed 8 years ago

vinodprathipati commented 8 years ago

Hi,

Could you please explain the better way to pass number of document Ids to aggregate function? Assume my document uses $in operation where it can take 100000 document ids.

How to handle this scenario?

Thanks, Vinod

ujibang commented 8 years ago

Hi @vinodprathipati

the value of avars query parameter is json!

so you can definitely do ?avars={"var=["a","b","c"]}

example:

http PUT 127.0.0.1:8080/test/coll aggrs:='[ {"type":"pipeline", "uri":"total-by-item", "stages": [  {"_$match": { "a::Item": { "_$in": {"_$var": "Item"} }}}, {"_$group": { "_id": "$a.Item", "total": {"_$sum": {"_$min": [ "$a.Amount", "$a.Amount2"]}}}}]}]'

note the $match stage: { "a::Item": { "_$in": {"_$var": "Item"} }}

the whole {"_$var": "Item"} section is going to be replaced by the json value of Item property of avars

http GET 127.0.0.1:8080/test/coll/_aggrs/total-by-item?avars='{"Item":["a","c"]}'
HTTP/1.1 200 OK
...
{
    "_embedded": {
        "rh:result": [
            {
                "_id": "a", 
                "total": 20
            }
        ]
    }, 
    "_returned": 1, 
    "_size": 1, 
    "_total_pages": 1
}

however you are asking about passing 100.000 ids to $in operator. Despite you can do it via a query parameter, the resulting URL will be giant and it may even exceed the limit.

I think that you should review your design in this case.

By the way, $in operator with 100.000 elements is bad anyway!

What is your opinion?

vinodprathipati commented 8 years ago

Hi @ujibang

I agree with your approach (in fact I am aware of this), But my question was more about URL size limit.

the resulting URL will be giant and it may even exceed the limit.

If I want to apply aggregate function on subset of records (big subset), there is no way I can apply this.

Why not aggregate requests can be "POST" requests ?

I am curious to know.

Thanks, Vinod

vinodprathipati commented 8 years ago

@ujibang

GET gives you a quick, and easy way to run queries, but the actual search criteria is limited. POST lets you send much more complex queries... I believe mongodb aggregate functionality is not just true aggregation(ex:sum, count..etc.), it lets you construct complex search queries...

mkjsix commented 8 years ago

I think moving to POST could be an interesting option, but it will require some work. We could put this idea as an enhancement in the backlog.

vinodprathipati commented 8 years ago

@mkjsix

Can we implement this in 2.0 . I noticed you released beta version.

Thanks

ujibang commented 8 years ago

we decided not to go for this since RESTHeart is a RESTful service and we want to stick on HTTP verb semantic where POST is e verb for writing data.

we understand the need but it is better to handle this special case with a custom handler