farmOS 2.x - Githubissues

paul121 commented 4 years ago

Opening this issue to track implementation of the farmOS 2.x API in this library. See https://www.drupal.org/project/farm/issues/3151243 for more info.

In this process I'd like to address:

Design goals voiced by @symbioquine in #31
- Support pagination/iteration elegantly for all record types #2
- Map elegantly between Python API and Drupal JSONAPI
Re-evaluate farmOS/farmOS.js#8 (#15)
Implement #14 in this library
Revamp tests (including #28)

For this first pass lets not address, but keep in mind:

farmOS data models #25 + #31
Async #31
Pluggable transport #31
Sensor API #13

Perhaps we can use https://github.com/qvantel/jsonapi-client for inspiration when implementing these ideas in the future. It looks like a good example for supporting both synchronous (via requests) and async (via asyncio) requests, as well as providing more of an object-oriented API more reflective of JSON:API.

Server API Changes

We will likely maintain support for farmOS 1.x. The server version can be determined by checking the API version included in /api (an alias to /farm.json) before including the necessary client methods. If applications need to support multiple server versions (such as farmOS Aggregator), this will be possible, but application code should be defensive in supporting different response formats returned via this library.

Response formats

Fetching an endpoint such as /api/log/{type} will return:

{ 'jsonapi': { jsonapi info }, 'data': [ log data ], 'meta': { response info }, 'included': { other records }, 'links': { pagination } }

Format of an individual record within the data list:

{ 'type': 'log--observation', 'id': '{UUID}', 'links': {}, 'attributes': { 'name': 'log name', ..}, 'relationships': { 'category': { 'data': [ ], 'links': {} } } }

Breaking changes

These will likely affect the API of the client library:

There is no longer a path that returns ALL records of a specific entity type (previously /log.json, /farm_asset.json). Consequently, the client API will require the entity bundle to be specified eg: client.log.get(type='observation')
Changes to pagination (lets implement #2)
Add helper function(s) to facilitate improved Filter + Sorting query params

Proposal

"Raw" requests

Return entire JSON response for these requests (all keys):

# client.log.get(type, filters)

# One page of observation logs (will not make multiple requests to include all results)
logs = client.log.get('observation')

# One observation log
log = client.log.get('observation', '{UUID}')

# Only observation logs marked as "done"
done_logs = client.log.get('observation', {'filter[status]': 'done'})

Pagination

Drupal docs: https://www.drupal.org/docs/core-modules-and-themes/core-modules/jsonapi-module/pagination

By default no pagination will take place. This gives the user full control over requests.

Iterators

Quoting @symbioquine fom #31

Why are there no "get all" or "query all" methods which return collections, only pagination and iteration? This would unnecessarily increase the "surface area" of the API while encouraging unperformant usage patterns that hold all entities/records in memory. Consumers which require the entities/pages as collections can easily do so themselves;

Iterators make it easy!


# Use in a for loop. With a default `page[limit]=50`, additional requests to the server are made every 50 iterations.
for log in client.log.iterate('observation'):
    print(log)

# Request all logs
logs = list(client.log.iterate('observation'))

Manual pagination

Users can implement pagination by passing page[limit] and page[offset] filters to client.log.get(type, filters). The user can keep track of pages internally OR;

Since the entire response is returned, users can request more data if there is a next link available. The client.fetch() function helps facilitate this. Without the helper function, the page[offset] and page[limit] params would need to be parsed from the next link before constructing another request - why not use the URL that is already built??

# Initial request, limit to 25 records per page.
response = client.log.get('observation', {'page[limit]': '25'})

# Loop until there is no more data
more = True
while more:
    try:
        next_url = response['links']['next']['href']
        response = client.fetch(next_url)
        response = response.json()
        # do something with data here
    except KeyError:
        more = False

Filters

Drupal docs (see Filtering, Sorting and Includes): https://www.drupal.org/docs/core-modules-and-themes/core-modules/jsonapi-module/filtering

(Please read the Drupal docs to understand how JSONAPI Filters work!!)

Filters can still be implemented as a Dictionary of query params to include, so no changes there. But a farmOS.Filter helper function would be useful in generating more complicated filters:

# Manual filtering
# Request two logs by ID
id1 = '{UUID}'
id2 = '{UUID}'
multiple = client.log.get('observation', {'filter[id][condition][path]': 'id', 'filter[id][condition][operator]': 'IN', 'filter[id][condition][value][]': [id1, id2]})

# With helper function
# Filter(path, value, operator = '=')
filters = Filter('id', [id1, id2], 'IN')
multiple = client.log.get('observation', filters)

# More complicated filters
# "Done" logs created today
now = round(time.time())
yesterday = now - 60 * 60 * 24

filters = {**Filter('status', 'done'), **Filter('created', [yesterday, now], 'BETWEEN')}
today = client.log.get('observation', filters)

# Is easier than:
today = client.log.get('observation', {'filter[status]': 'done', 'filter[today][condition][path]': 'created', 'filter[today][condition][operator]': 'BETWEEN', 'filter[today][condition][value][]': [yesterday, now]})

paul121 commented 4 years ago

Example

I sketched up some of these ideas in my local farmOS.py... here's some example usage: py_demo

Overall this is working great. The iterator is really quite simple to implement, as is the farmOS.filter method. The need to specify entity bundles via client.log.get(type, filters) seems logical enough.

But there are a few rough spots: Resource Types & Iterators vs Pages

Resource types

@mstenta @jgaehring I'm curious what you guys think of this issue in particular. It will have the most overlap with farmOS.js.

The type that is returned and associated with each record from the Drupal JSONAPI is of the format entity_type--bundle (log--input,asset--planting). This is notable because the returned type is different than the type expected in the client API methods: client.log.get(type, filters). More specifically the the entity_type is included in the JSONAPI type, while our client API abstracts this out via the namespace : client.log.get() vs client.asset.get().

I believe this creates a problem once we start considering logs (or any record) that references other JSONAPI resources (assets, terms, etc.) For context, here are the current log_type and category relationships that are returned for a log:

(I think this issue touches on the same problem: https://www.drupal.org/project/drupal/issues/3105318)

"relationships":{
   "log_type":{
      "data":{
         "type":"log_type--log_type",
         "id":"5cf12ac8-d6a6-4164-8eff-f60a08589a15"
      },
      "links":{
         "related":{
            "href":"http://localhost/api/log/input/2d2c2b71-171e-4ef1-8b1e-0c2d68af12bd/log_type"
         },
         "self":{
            "href":"http://localhost/api/log/input/2d2c2b71-171e-4ef1-8b1e-0c2d68af12bd/relationships/log_type"
         }
      }
   },
   "category":{
      "data":[
         {
            "type":"taxonomy_term--log_category",
            "id":"52c960eb-fd8b-4765-b843-e5c875d4088e"
         },
         {
            "type":"taxonomy_term--log_category",
            "id":"7ecf8f81-1c5c-44e6-86bd-6b74f41fabd7"
         }
      ],
      "links":{
         "related":{
            "href":"http://localhost/api/log/input/2d2c2b71-171e-4ef1-8b1e-0c2d68af12bd/category"
         },
         "self":{
            "href":"http://localhost/api/log/input/2d2c2b71-171e-4ef1-8b1e-0c2d68af12bd/relationships/category"
         }
      }
   },
  ... other relationships: owner, uid, log_type
}

To load the name of the log's categories via the client the user needs to know that category means the taxonomy vocabulary log_category. The returned "type": "taxonomy_term--log_category" can't be used:

# Get a log
log = client.log.get('input', 'UUID')
log_data = log['data']

# alias the terms
terms = log_data['relationships']['category']['data']

# Not possible; cannot simply reference the category "type"
cat = client.term.get(terms[0]['type'], terms[0]['id'])

# Load a single category; 'log_category' is assumed.
cat = client.term.get('log_category'), terms[0]['id'])

# OR to load both:
term_ids = [term[0]['id'], term[1]['id']]
cats = client.term.get('log_category', filter('category.id', [term_ids], 'IN'))

For log categories it might be an OK assumption that users know categories are in the log_category vocabulary. The is because the log.relationships.category field will only reference entities (taxonomy_terms) in one bundle (the log_category bundle).

But for assets, a log can reference any type of asset. This will make it much harder to load a specific asset referenced by a log:

# Get a log
log = client.log.get('input', 'UUID')
log_data = log['data']

# Not possible; cannot simply reference the asset "type"
asset = log_data['relationships']['asset']['data'][0]
asset_info = client.asset.get(asset['type'], asset['id'])

# Parse the bundle from "type"
asset = log_data['relationships']['asset']['data'][0]
asset_type = parse_bundle_from_type(asset['type'])
asset_info = client.asset(asset_type, asset['id'])

Some potential solutions:

We can override the resource_type strings with the jsonapi_extras module. This way we could return "type": "planting" instead of "type": "asset--planting". This seems to make logical sense for our use case (farmOS + client libraries) but I'm not sure consequences this would have for other Drupal JSONAPI specific integrations. While it might be safe to do this for asset resources (our custom entity type), it might be problematic to do this for taxonomy_term resources, as they would be common to other Drupal JSONAPI integrations.
Create a helper function that parses out the entity type + bundle from strings such as log--harvest. Is that a "simple enough" solution?

Iterators vs Pages

The client.log.iterate(type, filter) method I sketched up yields individual "records". This seems like it would fulfill the most common use case of simply getting individual records (especially considering the quick list method to get all records: list(client.log.iterate('input', filters).) This works by making a request to the server and iterating over each record in the returned data object before making another request to the server if there are more records.

The limitation with this is that any additional meta, included or links objects returned from an individual request to the server are not returned with each object yielded from the iterator. This is an important consideration because these objects may provide additional info about the returned objects. Not including them wouldn't expose all of the JSONAPI features to users of farmOS.py.

A solution to this could be providing an additional client.log.iterate_page(type, filters) method. This would allow users to iterate over entire pages:

all_logs = []
for page in client.log.iterate_page('input'):
    logs += page['data']
    includes = page['included']
    if 'omitted' in page['meta']:
        # alert user that some records were omitted.

And a separate client.log.iterate_record(type, filters) method could use client.log.iterate_page() internally. This would maintain the ability to quickly build a list() of all records without additional code.

jgaehring commented 4 years ago

Not sure I totally grok the issue with resource types. I guess I'm not sure where that relationships property fits in with the total response. Is it part of the log data itself? Or is it metadata that just accompanies the response?

Maybe it would be more helpful to discuss in live-time tomorrow?

paul121 commented 4 years ago

Yea, relationships is a part of the log data. Sorry if that wasn't clear, trying to limit how much I copy/paste! It will be easier to explain in person

mstenta commented 4 years ago

@jgaehring To summarize (after chatting with @paul121 and getting up to speed myself)...

JSON:API has some rules about how records are structured that differ from the current farmOS 1.x API implementation provided via the restws module. In general, they are good rules I think, because they make things more explicit, but it will require some thoughts for how they affect our particular use-cases, and how we need to adapt to them.

Specifically:

JSON:API has "resources" (one level), but Drupal has "entities" and "bundles" (two levels). In order to represent Drupal's added bundle level, the Drupal JSON:API module basically creates a separate JSON:API resource for each bundle, with the entity type in the name, like [entity-type]--[bundle]. For example: log--observation.
- Paul suggests adding a helper function (parse_bundle_from_type() in the examples above) to make it easier to pick that apart.
In order to get a record, you need to know 3 things about it (entity type, bundle, and ID), whereas in 1.x you only needed to know 2 things (entity type and ID).
- eg: in order to get an animal asset, and you have it's ID, you need to make a query to /api/asset/animal/[id]. In 1.x, the endpoint was simply /farm_asset/[id].json (notice that the bundle animal is not necessary).
- I don't think this will necessarily be an issue for Field Kit, but worth mentioning.
The log/asset/term/user objects are no longer flat, with all of their fields/properties on the same level. With JSON:API, it will have general info at the top level (just the type (eg: log--activity) and ID), as well as two sub-objects: attributes (which contains fields like notes, geometry, etc etc, and relationships (which will contain all reference fields like asset, area, etc)
- This means Field Kit will have to have some knowledge about whats an attribute and what's a relationship. I assume this distinction is in the JSON Schema data, so I don't expect it to be difficult to work with, but it's worth noting and planning ahead for.

mstenta commented 4 years ago

Create a helper function that parses out the entity type + bundle from strings such as log--harvest. Is that a "simple enough" solution?

In the interest of keeping things simple, I think this makes the most sense. I'm hesitant to start messing with resource aliases via JSON:API Extras module... after having seen just how complicated that can be in 1.x / restws. Just all around preferable to avoid if we can, IMO.

paul121 commented 3 years ago

I released an alpha version that implements much of what I had outlined above: https://github.com/farmOS/farmOS.py/releases/tag/v1.0.0-alpha.1

Some remaining questions/issues:

42 (would like feedback @mstenta & @jgaehring )
43 (going to move forward with this)
44 (curious to mirror how you are doing this @jgaehring )
45 (would like feedback on this as well)

paul121 commented 9 months ago

Moving JSON schema support to a later version. Closing this issue.

farmOS / farmOS.py

farmOS 2.x #39

Server API Changes

Response formats

Breaking changes

Proposal

"Raw" requests

Pagination

Iterators

Manual pagination

Filters

Example

Resource types

Iterators vs Pages

42 (would like feedback @mstenta & @jgaehring )

43 (going to move forward with this)

44 (curious to mirror how you are doing this @jgaehring )

45 (would like feedback on this as well)