Bulk update limit - Githubissues

wojcikstefan commented 7 years ago

Primarily to minimize the effects of a poorly constructed request.

After this change, flask-mongorest will by default limit bulk updates to 1k documents. If more than that would be affected, a 400 response is returned.

This PR also introduces a method which you can use to validate the request before processing of a bulk update starts.

wojcikstefan commented 7 years ago

@thomasst pls review

thomasst commented 7 years ago

I was initially thinking we would make the validate_bulk_update default implementation to do the check, but either way is fine with me.

wojcikstefan commented 7 years ago

Yea, I hear ya, though I'd prefer to expose the validation method before we git the database. basically, it's a similar concept to what validate_request does for single PUTs and POSTs. I think of the extra bulk_update_limit check as something separate.

thomasst commented 7 years ago

Some more thoughts on the validation logic:

What if validate_request was called for the PUT request (with obj == None initially), and you could implement your validation based on the view_method?
Then, validate_bulk_update could take a list of objs as a parameter.

wojcikstefan commented 7 years ago

@thomasst I looked into your suggestion and I don't like the outcome. I find it more confusing than the original implementation (i.e. the one from this PR). Few reasons why:

It's confusing that first you call validate_request w/o an obj and then you call it again once per obj - this seems like a poor design, where a single method has many different roles.
Behavior of validate_request would be inconsistent, because we'd call it before get_objects for bulk update, and after get_object for a single PUT.
It might be a breaking change that requires us to inspect all existing resources and ensure validate_request's code differentiates properly between single PUT and bulk PUT requests.

Given all of the above, I propose we move forward with this implementation.

thomasst commented 7 years ago

Okay, I agree that the purpose of the current version of validate_request should be to validate the input data for the given obj that is being modified (or None if we're creating an object). However, the naming of validate_request is confusing since it implies that it's called once per request, and not once per object, so maybe the naming should be improved. Note that we're already doing a "hack" in the task view so we can read out the update dict and figure out if certain fields were supplied with the request and are being modified.

Secondly, after thinking about it, I'm a bit hesitant of adding another method-specific method (validate_bulk_update) because it seems very specific to the bulk update (e.g. why not also have validate_update / validate_create?) and can probably be implemented more generically. Also, in this PR you're implementing validation logic in get_objects and not even using the method to validate the bulk update.

My proposals are:

Either introduce a generic request validation method that is called once per request, has access to the parsed request data, and lets you implement per-request validation.
Or, don't introduce any new validation methods, and let subclasses implement validation by either overriding the view methods (def put), or by e.g. overriding get_objects() and checking for view_method == methods.BulkUpdate.

wojcikstefan commented 7 years ago

Yea, I agree adding that extra validation method only makes things more inconsistent. Given time constrains, I'll go with not introducing any new methods.

philfreo commented 7 years ago

thomasst commented 7 years ago

After giving this another look I'm seeing another important issue: get_objects now no longer returns a queryset, and the bulk limit validation doesn't subclasses into account.

One of the nice things about mongorest since its beginning has been that you could override get_objects on the resource and override the parent queryset, like this:

class MyResource(Resource):
    def get_objects(self):
        return super(MyResource, self).get_objects().filter(status='X')

This now breaks on multiple levels: 1) we can't call .filter() anymore since the queryset is pre-evaluated, 2) even if we rewrite the method to not rely on a queryset, the bulk validation doesn't take the new filter into account.

thomasst commented 7 years ago

I haven't fully thought it through yet, but would keeping get_objects as a queryset, and implementing the length check in process_objects work?

wojcikstefan commented 7 years ago

I don't think that's true. Only get_queryset has that guarantee. get_objects afaik returned a list of objects in most cases.

closeio / flask-mongorest

Bulk update limit #92