closeio / flask-mongorest

Restful API framework wrapped around MongoEngine
Other
523 stars 88 forks source link

Bulk update limit #92

Closed wojcikstefan closed 7 years ago

wojcikstefan commented 7 years ago

Primarily to minimize the effects of a poorly constructed request.

After this change, flask-mongorest will by default limit bulk updates to 1k documents. If more than that would be affected, a 400 response is returned.

This PR also introduces a method which you can use to validate the request before processing of a bulk update starts.

wojcikstefan commented 7 years ago

@thomasst pls review

thomasst commented 7 years ago

I was initially thinking we would make the validate_bulk_update default implementation to do the check, but either way is fine with me.

wojcikstefan commented 7 years ago

Yea, I hear ya, though I'd prefer to expose the validation method before we git the database. basically, it's a similar concept to what validate_request does for single PUTs and POSTs. I think of the extra bulk_update_limit check as something separate.

thomasst commented 7 years ago

Some more thoughts on the validation logic:

wojcikstefan commented 7 years ago

@thomasst I looked into your suggestion and I don't like the outcome. I find it more confusing than the original implementation (i.e. the one from this PR). Few reasons why:

  1. It's confusing that first you call validate_request w/o an obj and then you call it again once per obj - this seems like a poor design, where a single method has many different roles.
  2. Behavior of validate_request would be inconsistent, because we'd call it before get_objects for bulk update, and after get_object for a single PUT.
  3. It might be a breaking change that requires us to inspect all existing resources and ensure validate_request's code differentiates properly between single PUT and bulk PUT requests.

Given all of the above, I propose we move forward with this implementation.

thomasst commented 7 years ago

Okay, I agree that the purpose of the current version of validate_request should be to validate the input data for the given obj that is being modified (or None if we're creating an object). However, the naming of validate_request is confusing since it implies that it's called once per request, and not once per object, so maybe the naming should be improved. Note that we're already doing a "hack" in the task view so we can read out the update dict and figure out if certain fields were supplied with the request and are being modified.

Secondly, after thinking about it, I'm a bit hesitant of adding another method-specific method (validate_bulk_update) because it seems very specific to the bulk update (e.g. why not also have validate_update / validate_create?) and can probably be implemented more generically. Also, in this PR you're implementing validation logic in get_objects and not even using the method to validate the bulk update.

My proposals are:

wojcikstefan commented 7 years ago

Yea, I agree adding that extra validation method only makes things more inconsistent. Given time constrains, I'll go with not introducing any new methods.

philfreo commented 7 years ago

See also https://github.com/closeio/closeio/issues/839

thomasst commented 7 years ago

After giving this another look I'm seeing another important issue: get_objects now no longer returns a queryset, and the bulk limit validation doesn't subclasses into account.

One of the nice things about mongorest since its beginning has been that you could override get_objects on the resource and override the parent queryset, like this:

class MyResource(Resource):
    def get_objects(self):
        return super(MyResource, self).get_objects().filter(status='X')

This now breaks on multiple levels: 1) we can't call .filter() anymore since the queryset is pre-evaluated, 2) even if we rewrite the method to not rely on a queryset, the bulk validation doesn't take the new filter into account.

thomasst commented 7 years ago

I haven't fully thought it through yet, but would keeping get_objects as a queryset, and implementing the length check in process_objects work?

wojcikstefan commented 7 years ago

I don't think that's true. Only get_queryset has that guarantee. get_objects afaik returned a list of objects in most cases.