farmOS / field-kit

A modular, offline-first companion app to farmOS.
https://farmOS.org
GNU General Public License v3.0
62 stars 39 forks source link

farmModel library in farmOS.js 2.x #431

Closed jgaehring closed 3 years ago

jgaehring commented 3 years ago

The second step in providing support for farmOS 2.x will be in extending the existing farmLog library for farmOS 1.x, currently a part of field kit itself, so that it's capable of creating, processing and formatting all entities, not just logs, in farmOS 2.x. The other big difference here is that we will finally be able to use JSON Schema, in place of the ad hoc data_schema we were using before, which should provide a lot more powerful and safer forms of validation, but may be more complex to implement. I should probably look into using some kind of JSON Schema validation library, if I can, rather than writing my own logic for generating defaults and validating data.

jgaehring commented 3 years ago

Just want to save this here for future reference: https://github.com/farmOS/farmOS/blob/2.x/modules/core/entity/farm_entity.base_fields.inc

jgaehring commented 3 years ago

Did a little looking around for JSON Schema data generators, but there are surprisingly few options.

The official list of implementations only lists a Python library: https://json-schema.org/implementations.html#data-from-schemas

AJV seems like a tried and tested validator, but only seems to do that, not generation: https://ajv.js.org/

There's a library called dreamjs that does generation but it's not been updated in 4 years and has a big bundle size: https://github.com/adleroliveira/dreamjs

That all taken into consideration with the particulars of our use case means we'll probably be better off with a custom solution.

jgaehring commented 3 years ago

I'm noticing some peculiarities about the JSON Schema documents, particularly for logs. (For reference, I'm looking at http://localhost/api/log/activity/resource/schema on my local server).

First and foremost, it doesn't include the log's id. Here's the schema, not including the definitions for the $refs:

{
  "$schema": "https://json-schema.org/draft/2019-09/hyper-schema",
  "$id": "http://localhost/api/log/activity/resource/schema",
  "title": "Activity log",
  "allOf": [
    {
      "type": "object",
      "properties": {
        "type": {
          "$ref": "#definitions/type"
        },
        "attributes": {
          "$ref": "#/definitions/attributes"
        },
        "relationships": {
          "$ref": "#/definitions/relationships"
        }
      }
    },
    {
      "$ref": "https://jsonapi.org/schema#/definitions/resource"
    }
  ]
}

You'll notice there are only 3 top level properties: type, attribute and relationships. Shouldn't id be there as well? I also don't see links, but I'm perfectly fine with that being excluded, because I can't think why I'd need that anyway.

Also, the required properties for attributes don't fully make sense to me:

{
  "required": [
    "drupal_internal__id",
    "drupal_internal__revision_id",
    "status",
    "revision_translation_affected"
  ]
}

What are they required for, exactly? They're not required in POST requests, I know that. I interpret required to mean required by JSON:API, otherwise, what relevance is it to the JSON Schema? For relationships, meanwhile, there are no required properties, even for a seeding log, which if I recall, requires a planting asset, no?

Anyways, I'll report more issues as I encounter them. @mstenta, I'd love your input on this, but it's by no means urgent.

mstenta commented 3 years ago

First and foremost, it doesn't include the log's id

Huh - yea I would expect ID to be there too, for the sake of completeness. Can't think why it wouldn't be included.

Might be worth opening an issue upstream to inquire about it: https://www.drupal.org/project/issues/jsonapi_schema

Practically speaking WE know the id will be a string (UUID) - so we can hard-code that knowledge if we have to.

Also, the required properties for attributes don't fully make sense to me: What are they required for, exactly? meanwhile, there are no required properties, even for a seeding log, which if I recall, requires a planting asset, no?

So actually: seeding logs do NOT require a planting asset anymore! This is something we had been considering changing in 1.x. So that's correct.

The status attribute IS required for assets and logs, and should be required to POST. This replaces 1.x's done boolean for Logs (now values should be pending or done strings), and it replaces archived for Assets (now values should be active or archived, and we have an additional archived field for storing the timestamp that it was archived, much like created and changed timestamps).

As for drupal_internal__id, drupal_internal__revision_id, revision_translation_affected - they are required to be set in the farmOS database - but not required for POST operations, because they will be automatically set by Drupal (I assume - have you confirmed this?).

This does raise an interesting question about "required" status as it relates to the API and POST requests. I wonder if there is some way to distinguish POST required fields.

Maybe this too would be worth opening an issue for upstream - if only to ask whether or not there's a reason for the Drupal internal ones to be included. My suspicion is that it is simply taking that status directly from the lower-level Drupal field API, where those fields ARE technically required in the database.

mstenta commented 3 years ago

I filed an issue upstream to inquire about these two details specifically: https://www.drupal.org/project/jsonapi_schema/issues/3190497

How urgent is this? Will it block us from moving forward? (edit: nevermind just saw your "it's by no means urgent.") :-)

jgaehring commented 3 years ago

Thanks for opening the issue. It does seem odd, right?

Practically speaking WE know the id will be a string (UUID) - so we can hard-code that knowledge if we have to.

Totally. Hardcoding is not ideal, but at least this is a very solid assumption to make across the board. Shouldn't be too tricky.

So actually: seeding logs do NOT require a planting asset anymore!

Oh! Great to hear! I wonder, would it be worth reviewing some of the more common log types that do still have required fields and just checking to see if they're being picked up by JSON Schema?

The status attribute IS required for assets and logs, and should be required to POST.

Interesting. I'll have to try a test w/o the status to see what happens. Will let you know.

As for drupal_internal__id, drupal_internal__revision_id, revision_translation_affected - they are required to be set in the farmOS database - but not required for POST operations, because they will be automatically set by Drupal (I assume - have you confirmed this?).

I have not set them in my POST requests and have not run into any failed requests. So yea, seems weird that the JSON Schema would be communicating requirements which do not pertain to the REST API.

mstenta commented 3 years ago

Interesting. I'll have to try a test w/o the status to see what happens. Will let you know.

Oh you know what: I bet this is not "hard" required because it has a "default value" of "pending" for new logs (so it gets auto-filled). I wonder if JSON Schema shows the default values anywhere?

mstenta commented 3 years ago

Huh - just looked at /api/asset/animal/resource/schema and I see that in addition to the fields you mentioned above, name and geometry are also listed as required.

name makes sense - that is required for assets, and unlike Logs it will not be automatically generated (which is why it's not technically required for Logs).

I'm not sure why geometry is showing as required though. That is one of the two computed fields (geometry and location). I'll see if I can figure out why that's happening...

jgaehring commented 3 years ago

I'll have to try a test w/o the status to see what happens. Will let you know.

I actually checked and realized my existing test already sends a log w/o the status field:

https://github.com/jgaehring/farmOS.js/blob/a0d75dfb05e53b794cc0854de998404d1749f174/test/connect/log.js#L12-L19

So I guess that requirement is not being enforced in POST requests. :thinking:

I wonder if JSON Schema shows the default values anywhere?

It does, but not for the status field. Here's an interesting snippet from the log--activity schema:

{
  "name": {
    "type": "string",
    "title": "Name",
    "maxLength": 255,
    "description": "The name of the log. Leave this blank to automatically generate a name.",
    "default": ""
  },
  "timestamp": {
    "type": "number",
    "title": "Timestamp",
    "format": "utc-millisec",
    "description": "Timestamp of the event being logged."
  },
  "status": {
    "type": "string",
    "title": "Status",
    "maxLength": 255,
    "description": "Indicates the status of the log."
  }
}

Definitely some weird stuff going on. I'd be really curious to hear from the module authors exactly how they interpret the required keyword. Checking the spec, it just says it's used to determine if a JSON document is "valid" or not. There doesn't seem to be any standard interpretation for what kind of behavior a REST API should take with valid or invalid documents. So again, yea, I'd love to hear the authors take on how the JSON Schema is meant to represent the API's internal validation procedures, generally.

paul121 commented 3 years ago

I opened a similar issue in jsonapi_schema a while back pertaining to the schema enum for entity types: https://www.drupal.org/project/jsonapi_schema/issues/3174991

That issue points out that an enum is provided for the log.equipment asset reference field, to specify that only the asset--equipment bundle is allowed. But when all bundles are allowed on an entity reference field (such as log.asset), the enum is not populated.

For the status field, there could be an enum of pending and done. If additional states were added by contrib modules, then these would become visible in the schema & known to 3rd parties (very useful!)

That's just to say, this all seems connected.. for our status field (which is a state machine field type), it seems that JSONAPI schema doesn't use the allowed_values or default_value properties defined by that field type. Seems like this should be possible?

paul121 commented 3 years ago

Hmm also one consideration re: the drupal_internal__id field. For config entities this field's value is a string, but for content entities it is an integer. In the context of 3rd party API integrations it seems pretty important to include this field to allow the ability to query config entities by their non-UUID ID. Not sure if this is an issue @jgaehring (maybe in a different issue, but I think you mentioned throwing out the drupal specific fields) ?

If a FM wanted to create a log with the monitor flag, it has to query those resources by drupal_internal__id, not label, in order to find the correct UUID.

Example config entities in farmOS 2.x are log types, asset types, and flags.

An example flag:

{
   "type":"flag--flag",
   "id":"8cb92703-d054-4c62-bfcc-ee3bad060bdc",
   "links":{
      "self":{
         "href":"http://localhost/api/flag/flag/8cb92703-d054-4c62-bfcc-ee3bad060bdc"
      }
   },
   "attributes":{
      "langcode":"en",
      "status":true,
      "dependencies":{
         "enforced":{
            "module":[
               "farm_flag"
            ]
         }
      },
      "drupal_internal__id":"monitor",
      "label":"Monitor"
   }
},
jgaehring commented 3 years ago

For the status field, there could be an enum of pending and done. If additional states were added by contrib modules, then these would become visible in the schema & known to 3rd parties (very useful!)

Oh yea! I hadn't noticed that, but you're totally right!

That's just to say, this all seems connected.. for our status field (which is a state machine field type), it seems that JSONAPI schema doesn't use the allowed_values or default_value properties defined by that field type. Seems like this should be possible?

Agreed, definitely seems connected. I'll leave it to you and Mike to determine how possible any of this is. Do you think the module authors are accepting contributions?

If a FM wanted to create a log with the monitor flag, it has to query those resources by drupal_internal__id, not label, in order to find the correct UUID.

Ohhhhhh, good to know!

(maybe in a different issue, but I think you mentioned throwing out the drupal specific fields) ?

I had. Although on further consideration, I'm more leaning towards moving those fields into meta-data, so the information isn't lost, but hidden from routine API users. This definitely strengthens the argument for keeping those fields.

jgaehring commented 3 years ago

Totally bikeshedding here, but I think I'm going to rename this farmModel, or farm.model. Only slightly less vague, compared to farmData, but more to the point, I think.

jgaehring commented 3 years ago

In an effort to break out of some analysis paralysis I've been having, I want to list out the various decisions I've been struggling with, and try to answer them one-by-one in subsequent comments:

jgaehring commented 3 years ago

Serialized format

Link references to other entities via Proxies (as mentioned in #401)?

no

Use attributes and relationships objects?

yes

Colocate metadata with primary data?

no

Move metadata to meta field?

yes

In-memory format

Use custom getters and setters?

yes

Use JS Symbols?

no

Multiple helpers for accessing metadata?

TBD

Use attributes and relationships objects?

no

Co-locate metadata with primary data?

no

Move metadata to meta field?

yes

Generating default objects from JSON Schema

Use Cambria?

no

Use json-schema-tools?

no

Use json-schema-defaults?

no

Roll my own?

roll my own defaults stub

jgaehring commented 3 years ago

Reminder to myself to address #418 before closing this.

jgaehring commented 3 years ago

Closing this now, because I think all the main features are implemented for all entities and just need to be integrated with farm.connect and Field Kit.