Cadasta / django-jsonattrs

Django app to provide Postgres JSON-based attribute management. As this was used in the Cadasta's legacy platform, the repository is now unmaintained.
GNU Affero General Public License v3.0
3 stars 0 forks source link

Raise ValidationError when no Attribute exists for key #18

Closed oliverroick closed 6 years ago

oliverroick commented 7 years ago

Steps to reproduce

  1. Create a Schema with a set of Attribute instances; for example name, title.
  2. Create a model instance with a JSONAttributes field using the Schema created before. In addition to valid key-value pairs for name and title, add an extra pair using a key for which there is no corresponding Attribute defined in the Schema.
  3. Save the model.

Actual behavior

A KeyError is thrown.

Expected behavior

A ValidationError should be thrown so it can be handled accordingly by forms, serializers, and views.

alukach commented 7 years ago

Are we certain that the above behaviour is deemed to be "correct behaviour"? For example, a situation I'm running into at the moment:

I'm interesting in uploading a GeoJSON file to /api/v1/organization/an-org/projects/a-project/spatial/. It the GeoJSON file is as follows:

{
  "geometry": {
    "type": "Polygon",
    "coordinates": [
      [
        [97.253725, 17.563282],
        [97.253731, 17.563215],
        [97.253732, 17.563216],
        [97.253725, 17.563282]
      ]
    ]
  },
  "type": "Feature",
  "id": "0",
  "bbox": [97.253496, 17.562455, 97.253963, 17.563284
  ],
  "properties": {
    "Id": 0,
    "Name_Esp": "asdf",
    "Name_Eng": "asdf",
    "Edited_By": "asdf",
    "Area_ac": 1,
    "Area__ac_D": 0.537451098718
  }
}

Currently, if the properties is left as is, it will be overwritten with something like:

{
  "attributes": {},
  "id": "s27u9xgcdctmrptt9k4amqr5",
  "type": "PA"
}

What I'd like to do is to place that data into the properties.attribute object so as to preserve the originally uploaded data, however when attempting to do so I get the KeyError as described before.

My (limited) understanding is that django-jsonattrs is intended to provide validation around key:value attributes fields. My expectation would be that it would only validate on attributes that have known restrictions and it would pass on any other data (thereby allowing it to be saved/loaded). Is this not in the spirit of django-jsonattrs? Should we add some sort of strict flag to the field that would control whether a ValidationError is raised or if KeyErrors silently pass?

oliverroick commented 7 years ago

I'm ok to do it either way.

My thinking was that if you use additional keys there's almost no way to make use of this information in the platform itself. You can't view or edit them on the website and they won't be returned from the API (if I'm not mistaken about this). So I thought it's better to tell users "Hey, you're trying to use an attribute that we don't know about" instead of silently swallowing those attributes.

If we want a hybrid solution for this, we can use a flag. That's probably something we would add to the field definition on the model?

wonderchook commented 7 years ago

I like the flag idea, though wouldn't we then need to use the flag everywhere anyway?

I think there should be at least a warning when people are trying to save attributes not in the schema, since I think in most cases that would be unintended.

alukach commented 7 years ago

My thinking was that if you use additional keys there's almost no way to make use of this information in the platform itself. You can't view or edit them on the website and they won't be returned from the API (if I'm not mistaken about this).

Ah, I wasn't aware of this. That does kind of change things.

I'm a bit conflicted on the "right" way to go.

On silently ignoring unexpected data: Something feels a bit wrong about just ignoring unexpected attributes, however this is how the rest of our API works (send a POST with {"name": "An Actual Party Field", "a-non-existent-field": "This will just be ignored"} to /api/v1/organizations/{org}/projects and you'll receive a 201, not a 400). One could argue that the data isn't really silently ignored, being that you do receive the created resource in the response body (where you could see that certain fields are not present).

On explicitly rejecting unexpected data: At first blush, this feels like it's right although the one thing that stands out to me is that it's not immediately obvious to a user what the permitted attributes are. Maybe the user could do a GET on the questionnaire endpoint, however I'm unsure if this is in a format that is easily parsed to understand which resources can take which attributes. If we were to take this route, a nice accompaniment would be to have the details of permitted attributes be returned in an OPTIONS request.

In keeping with accordance of the rest of the API, I lean very slightly towards ignoring unexpected attributes... I guess even if we were to do that the update to OPTIONS request would be helpful.

alukach commented 7 years ago

@wonderchook My thinking on the flag was something that would be set on the model itself (e.g. attributes = JSONAttributeField(default={}, strict=True). So, this would be set once per model.

wonderchook commented 7 years ago

@alukach yes, we are on the same page. I was just meaning that I wasn't seeing situations where we would set the flag to False right now. Though it would give flexibility for future development.

alukach commented 7 years ago

Ah, yeah. I was thinking more about any potential third-party users of this library, however given that there's no README.md or documentation, I doubt anyone's relying on it (although still probably best practice to consider that someone may be using it).

Edit, disregard, apparently there is documentation.

oliverroick commented 6 years ago

https://opbeat.com/cadasta/platform-staging/errors/331/

oliverroick commented 6 years ago

Raising a ValidationError from doesn't solve the issue in the straightforward way I expected:

The ValidationError is not raised during the normal validation process in serializers but when the model is saved, which means we have to add additional, slightly ugly changes to the platform code to return the appropriate error messages via the API.

An alternative solution would be not to change the jsonattrs code and change the platform's validation of attributes. Currently, we loop through all attributes as they are defined in a questionnaire and validate each one of them.

We can turn the loop around and loop through all the key-value-pairs as they are provided in the request payload and raise an validation error if we encounter a key that has no corresponding attribute defined (see this commit). I prefer this solution because it doesn't require weird code and it feels like the correct thing to do, i.e., raise the error during data validation and not when the model is saved.

alukach commented 6 years ago

which means we have to add additional, slightly ugly changes to the platform code to return the appropriate error messages via the API.

It surprises me that the DRF doesn't have some built-in logic that would translate a ValidationError from a model to something usable by DRF.

Reading the DRF docs, I see that it states:

Validation in Django REST framework serializers is handled a little differently to how validation works in Django's ModelForm class.

With ModelForm the validation is performed partially on the form, and partially on the model instance. With REST framework the validation is performed entirely on the serializer class. This is advantageous for the following reasons:

  • It introduces a proper separation of concerns, making your code behavior more obvious.
  • It is easy to switch between using shortcut ModelSerializer classes and using explicit Serializer classes. Any validation behavior being used for ModelSerializer is simple to replicate.
  • Printing the repr of a serializer instance will show you exactly what validation rules it applies. There's no extra hidden validation behavior being called on the model instance.

It seems that they pride themselves on not tying-into the general model validation. While that may be a win for decoupling, it does seem to promote duplicating logic between the model and serializer.

Okay, looking at your suggestions:

  1. Write a custom re-casting of the ValidationError to something that DRF can use. I also found this gist suggesting a global way to do this. This has a win that it would apply to any model/serializer down the road and would allow use to easily use the same validation model for template-based HtML views as we use for the DRF serializers (single point of validation truth).
  2. Update the JSONAttrsSerializer's validation tooling to catch unknown fields.

I dunno, I actually think I like solution 1 a bit more as it helps us avoid these discrepancies between non-API CRUD actions and API CRUD actions... It does kind of go against the intended DRF paradigm which is a bit of a bad sign. So yeah, I would accept a PR of either solution. I think what would be ideal would be to use solution 2 but have it rely on some validation that exists on the field class (which the model would also use, however I don't really know what is the best way to do that).