Open symbioquine opened 4 years ago
@symbioquine great!! Look forward to hearing your thoughts. As I mentioned in that issue, all I've briefly looked into is the httpx library as a potential path to async support.
I've just released a new beta version of this library with additional support for OAuth (most everything up to this point has been a beta release for testing in the Aggregator). Once we get OAuth finalized in 7.x-1.4, I think we'll release a v0.1.6 of the library. Perhaps we could start drawing up async support for v0.2.0 and allow for some more breaking changes/improvements.
@symbioquine Just seeing the updated description! This is great.
Support all major farmOS data types which can be accessed via a vanilla farmOS installation's APIs
I would really love to have some farmOS Data Types! There's some discussion at farmOS/farmOS#231 about adding JSON Schema definitions to an API endpoint. This way, it might be possible to ingest the data type schemas (which might vary from one farmOS server to another) and generate the data models then. This will have to wait until farmOS 2.x, however
Support both synchronous and asynchronous usage with same API and code base
:+1:
"Batteries included" extension strategy
:+1:
Pluggable authentication support without otherwise affecting API interface; session cookie, basic auth, OAuth2, etc
This sounds good! I think we will really only need to support OAuth and Session cookie auth, but adding support for Basic would be fairly easy too. However because the modules for Basic Auth aren't included with farmOS core, lets not prioritize that now (although this might be different with D8)
Pluggable transport backend without otherwise affecting API interface; requests, AIOHTTP, Twisted, etc
Yea, interested how we can do this! Also curious what are advantages/disadvantages of the different backends. I want to lean towards whichever makes it easiest for this that you bring up -> "How to support sync/async in the same code base" !
Also, @symbioquine are you familiar with the included python ConfigParser library? I've been using this in the latest releases as it makes it a bit easier to provide default configurations for the various OAuth settings. In certain instances, it might also be nice to save different "Profiles" in one config file - I'm curious what your thoughts are with that. This would be related to the farmOS.create_session_client
API sketch - perhaps the client type could be defined in a Config object?
Thanks for all the feedback @paul121!
I would really love to have some farmOS Data Types! There's some discussion at farmOS/farmOS#231 about adding JSON Schema definitions to an API endpoint. This way, it might be possible to ingest the data type schemas (which might vary from one farmOS server to another) and generate the data models then. This will have to wait until farmOS 2.x, however
Hmmm... there's a tradeoff there. If those types are directly exposed to clients of farmOS.py it won't be abstracting the underlying transport (http/JSON/etc) API anymore.
I should update the proposal's design objectives to be clearer on the abstraction point so we can hash this out more fully. My thought was that the types should be part of farmOS.py so that it can abstract between farmOS versions and hide ugly parts of the transport API. e.g. why is the primary id field of an area called "tid"...
This sounds good! I think we will really only need to support OAuth and Session cookie auth, but adding support for Basic would be fairly easy too. However because the modules for Basic Auth aren't included with farmOS core, lets not prioritize that now (although this might be different with D8)
Agreed on ignoring Basic Auth for the time being.
Pluggable transport backend without otherwise affecting API interface; requests, AIOHTTP, Twisted, etc
Yea, interested how we can do this! Also curious what are advantages/disadvantages of the different backends.
Supporting different backends would allow the client to be used directly in existing frameworks/applications. This is especially important for the async cases where it needs to play nicely with whichever event loop is being used to manage the async co-routines.
There are probably other differences between various backend options - performance, proxy configurability, TLS ciphers, etc. - but I don't know how important they are.
I want to lean towards whichever makes it easiest for this that you bring up -> "How to support sync/async in the same code base" !
I have a prototype which already achieves this to some degree although it may require a bit of refining...
Planning to add more detail to this proposal in the next couple days which should capture a structural diagram and the high level interfaces involved.
Also, @symbioquine are you familiar with the included python ConfigParser library? I've been using this in the latest releases as it makes it a bit easier to provide default configurations for the various OAuth settings. In certain instances, it might also be nice to save different "Profiles" in one config file - I'm curious what your thoughts are with that. This would be related to the
farmOS.create_session_client
API sketch - perhaps the client type could be defined in a Config object?
I've worked with a number of different configuration frameworks in past - including a very long time ago with ConfigParser... These days I'm pretty wary of baking "configuration" into APIs - especially early on or at too low of a level.
Don't get me wrong, there are scenarios where externalizing configuration makes sense, I'd just recommend waiting until we have a very distinct group of use-cases which are hard to serve any other way and their criteria upon which the API can be crafted. Even then it is probably wise to make the configuration functionality an optional "convenience" part of the API rather than a core part.
I should update the proposal's design objectives to be clearer on the abstraction point so we can hash this out more fully. My thought was that the types should be part of farmOS.py so that it can abstract between farmOS versions and hide ugly parts of the transport API. e.g. why is the primary id field of an area called "tid"...
Hmm yea, I see what you're saying. Perhaps certain aspects like the "tid" could be abstracted, while also pulling the schemas from the farmOS server. A good example for this might be the "Soil Test" log type - this module might not be enabled on the server. Similarly, certain field types on a "log" may be provided by optional modules. Once pulled from the server, maybe these models could still be abstracted??
I've been working a lot with the Pydantic library and would be excited to use that here for data validation! I actually just recently learned about its Settings Management features:
This makes it easy to:
- Create a clearly-defined, type-hinted application configuration class
- Automatically read modifications to the configuration from environment variables
- Manually override specific settings in the initialiser where desired (e.g. in unit tests)
... Which might be an alternative to ConfigParser.. re: an API "configuration"...
Don't get me wrong, there are scenarios where externalizing configuration makes sense, I'd just recommend waiting until we have a very distinct group of use-cases which are hard to serve any other way and their criteria upon which the API can be crafted. Even then it is probably wise to make the configuration functionality an optional "convenience" part of the API rather than a core part.
Yeah, I agree that the configuration should be an optional "convenience" feature. Supporting Drupal Session Auth & OAuth Auth got to be fairly complicated from a single API level because parameters like username
and password
could be used by either session type. OAuth in particular can require quite a few parameters, so the Config helps clean that up. But I really like your sketch of doing something like client = farmOS.create_oauth_session(...)
!
A good use case for an API Configuration would be using the library to POST
sensor data. In the case of the simple Sensor Listener
asset, the hostname
, publickey
and privatekey
could be saved in config, and read only when a python script is run.
I have a prototype which already achieves this to some degree although it may require a bit of refining... Planning to add more detail to this proposal in the next couple days which should capture a structural diagram and the high level interfaces involved.
Great!! Excited to continue thinking this through! π
Skimming through this a bit since I saw the topic of JSON Schema was mentioned...
I would really love to have some farmOS Data Types! There's some discussion at farmOS/farmOS#231 about adding JSON Schema definitions to an API endpoint. This way, it might be possible to ingest the data type schemas (which might vary from one farmOS server to another) and generate the data models then. This will have to wait until farmOS 2.x, however
Hmmm... there's a tradeoff there. If those types are directly exposed to clients of farmOS.py it won't be abstracting the underlying transport (http/JSON/etc) API anymore. I should update the proposal's design objectives to be clearer on the abstraction point so we can hash this out more fully. My thought was that the types should be part of farmOS.py so that it can abstract between farmOS versions and hide ugly parts of the transport API. e.g. why is the primary id field of an area called "tid"...
Hmm yea, I see what you're saying. Perhaps certain aspects like the "tid" could be abstracted, while also pulling the schemas from the farmOS server. A good example for this might be the "Soil Test" log type - this module might not be enabled on the server. Similarly, certain field types on a "log" may be provided by optional modules. Once pulled from the server, maybe these models could still be abstracted??
I'm not sure if this speaks to your concern, depending on exactly what you meant by the "transport API", but the approach I've been taking with Field Kit is to take the schema data as a partially applied dependency, which could come from any sort of source, whether it's over the wire via http, or whether that schema is stored on disk, or what have you. Then that spits out some functions for creating and updating logs based on that schema. I've got a gist that demonstrates the basic usage I'm imaging (the actual implementation is still in the works): https://gist.github.com/jgaehring/8799af782404fdf8abbaa116b3e97614
I suppose this is not the purest form of type definition, b/c it still assumes JSON as the data interchange format, but I feel like if we're able to abstract the data model to JSON Schema, that brings us at least a step closer to something that would be truly format agnostic. I'm curious, do you know of any other methods of providing a spec for such data models that would be less coupled to a particular format?
In the meantime, I've opened a new issue, https://github.com/farmOS/farmOS/issues/243, specifically for adding JSON Schema to farmOS.
I have among my long-term aspirations the dream of creating a local-first implementation of farmOS that could support the farmOS data model while not ever requiring connection to a farmOS server. It's my hope that the farmLog
library I'm sketching out in the above gist could become the germ for such an application, where log types (as well as asset types etc) could be defined by providing a similar type of schema, while also preserving the modularity of farmOS. So far JSON Schema seems like a powerful way to attain that, but I'd love to know alternatives.
@paul121
I should update the proposal's design objectives to be clearer on the abstraction point so we can hash this out more fully. My thought was that the types should be part of farmOS.py so that it can abstract between farmOS versions and hide ugly parts of the transport API. e.g. why is the primary id field of an area called "tid"...
Hmm yea, I see what you're saying. Perhaps certain aspects like the "tid" could be abstracted, while also pulling the schemas from the farmOS server. A good example for this might be the "Soil Test" log type - this module might not be enabled on the server. Similarly, certain field types on a "log" may be provided by optional modules. Once pulled from the server, maybe these models could still be abstracted??
I want to explore this further in my reply to Jamie's comments, but I'm initially skeptical of the idea of dynamically basing the model types on a schema vended by farmOS at runtime. I think it may be difficult to do that and still provide a useful abstraction to consumers of farmOS.py.
Don't get me wrong...
Yeah, I agree that the configuration should be an optional "convenience" feature. Supporting Drupal Session Auth & OAuth Auth got to be fairly complicated from a single API level because parameters like
username
andpassword
could be used by either session type. OAuth in particular can require quite a few parameters, so the Config helps clean that up. But I really like your sketch of doing something likeclient = farmOS.create_oauth_session(...)
!
Agreed that shoving all the parameter combinations into a single constructor/factory-method would get really complex - as you noted, I was trying to get around that with a separate static factory-method for each scenario. (I'll add a FAQ entry to the issue description tackling the possible combinatorial explosion of said factory methods.)
That Pydantic settings management functionality looks like a good way to provide the configuration convenience methods we touched on above...
async def create_session_client(url: str, username: str, password: str):
....
async def create_session_client_from_settings(settings: FarmOSSessionClientSettings):
return create_session_client(**settings.dict())
A good use case for an API Configuration would be using the library to
POST
sensor data. In the case of the simpleSensor Listener
asset, thehostname
,publickey
andprivatekey
could be saved in config, and read only when a python script is run.
I'm interested to learn more about this use-case, but my initial reaction is that the actual sensor integration and accompanying environment might be out of scope for farmOS.py.
I would imagine the level of support that farmOS.py might need to provide is;
async with farmOS.create_asyncio_sensor(url="https://farmos.local", public_key="xxxxxx", private_key="yyyyyy") as sensor:
await sensor.publish(data={"temperature": 76.5, "humidity": 60})
await sensor.publish(data={"temperature": 76.5, "humidity": 60}, datetime=datetime.fromtimestamp(1541519720))
or;
sensor_settings = FarmOSSensorSettings.parse_file("/path/to/sensor_settings.json")
async with farmOS.create_asyncio_sensor_from_settings(sensor_settings) as sensor:
...
@jgaehring, @paul121
I'm not sure if this speaks to your concern, depending on exactly what you meant by the "transport API"
I mean the REST/JSON/Drupal-entity API represents one abstraction boundary which aligns with what actually travels over the wire - via the http(s) network transport.
In its current incarnation, that "transport" API probably cannot be expected to abstract a wide range of current/future farmOS versions since it is partially an implementation detail of the underlying Drupal framework and likely needs to be able to change in order to allow farmOS itself to grow/evolve.
On the other hand, farmOS.py represents an different kind of API boundary which could provide a stable abstraction to its consumers over a (reasonably) wide range of farmOS versions. To put this more concretely, ideally none of the code in farmOS/farmOS-aggregator or symbioquine/farm-os-area-feature-proxy should need to change to support the D8 version (and be backwards compatible with the D7 version) of farmOS if we do this correctly. (Other than making sure to use a new enough version of farmOS.py.)
but the approach I've been taking with Field Kit is to take the schema data as a partially applied dependency, which could come from any sort of source, whether it's over the wire via http, or whether that schema is stored on disk, or what have you. Then that spits out some functions for creating and updating logs based on that schema. I've got a gist that demonstrates the basic usage I'm imaging (the actual implementation is still in the works): https://gist.github.com/jgaehring/8799af782404fdf8abbaa116b3e97614
I like where you're going with this, but to achieve what I've described above, the dependency would need to be a build-time dependency. i.e. model types or code could be generated from various schema at build-time, but unless I'm missing something, I don't think it would work at runtime without pushing version compatibility off on the consumers of farmOS.py.
I suppose this is not the purest form of type definition, b/c it still assumes JSON as the data interchange format, but I feel like if we're able to abstract the data model to JSON Schema, that brings us at least a step closer to something that would be truly format agnostic. I'm curious, do you know of any other methods of providing a spec for such data models that would be less coupled to a particular format?
I'm curious about the motivation for decoupling from JSON? This seems like a perfectly fine hard dependency to me - given the ubiquity of JSON and how well it is supported over a wide range of languages/tools. Do you know of any use-case for farmOS.py which cannot use JSON as a transport/interchange format?
In the meantime, I've opened a new issue, farmOS/farmOS#243, specifically for adding JSON Schema to farmOS.
Even in light of what I've said above, I agree that this would a be a useful feature.
I have among my long-term aspirations the dream of creating a local-first implementation of farmOS that could support the farmOS data model while not ever requiring connection to a farmOS server.
I'd like to hear more about this. I've been considering something that seems related - that it would be useful to have a "mock/headless farmOS" reference implementation that just keeps the data in memory and hosts the transport API against which libraries like farmOS.py could be tested.
@paul121 @jgaehring Also I've updated the issue description to add more examples, and add diagrams showing how the dependencies/methods of the parallel async/sync APIs could be organized.
Added a references section, diagrams for the top-level module's methods, and a couple more FAQs.
Also tagging @mstenta for comment since he was active on https://github.com/symbioquine/farm-os-area-feature-proxy/issues/1
@symbioquine Really appreciate all the thought you've put into this!! πͺ Overall, I think this proposal is great and would bring some really great features to the python library!
re: recent changes..
Why expose pagination explicitly? ... It would be really challenging or impossible to completely hide the pagination and still provide that flexibility.
I like the idea of exposing pagination via the API. Currently, pagination is "exposed" via the returned response (see #15) and allows the consumer to request specific pages by passing a page
into the filters
object (see in code here). Admittedly this is a bit "hidden" and not very intuitive. We implemented this as a means of trying to match the API between the JS and Python libraries - but I think this is a good example of how the library should differ in favor of improving the "Pythonic" API π
Why are there no "get all" or "query all" methods" ... This would unnecessarily increase the "surface area" of the API while encouraging unperformant usage patterns that hold all entities/records in memory. Consumers which require the entities/pages as collections can easily do so themselves;
all_areas = list(farm.area.iterate_query(filters={'area_type': 'field'}))
The current implementation defaults to a "get all" action unless a page
filter is provided. This likely became the default action as a result of not having other exposed pagination features... which does encourage unperformant usage patters. Seeing this example is really quite convincing.. a great example of iterators! π
How would the proposed API support retrieving multiple records by id in a single request - i.e. #14? A better approach would be to support this via the filters argument to the query_page, iterate_query_pages, iterate_query methods.
π π Great examples in the FAQ! Another benefit of the filters
argument is that it could still allow supplying a page
filter as described above.
I'm still thinking on best ways to support Settings.. it would be great to avoid the second API method. I think think this would work just fine?
# Build settings object from env var or read from file
settings = FarmOSOAuthClientSettings()
async def create_oauth_client(**settings):
....
Re: JSON Schema....
I think our general motivation for JSON Schema has been to support record validation in the client libraries. The motivation for this comes from different log types having different fields: (screenshot of /farm.json
output, which documents simple schema specs, but not JSON Schema compliant)
This is important because some farmOS servers could have the soil_test
module disabled, and thus not support the farm_soil_test
log type. Similarly, the farm_quantity
module could be disabled and not add quantity
fields to the above log types. (correct me if I'm wrong @mstenta )
I like where you're going with this, but to achieve what I've described above, the dependency would need to be a build-time dependency. i.e. model types or code could be generated from various schema at build-time, but unless I'm missing something, I don't think it would work at runtime without pushing version compatibility off on the consumers of farmOS.py.
From my understanding, consumers will always be responsible for checking the supported record types on a farmOS server they connect to - which is different than "version compatibility" of the farmOS server. What we're trying to achieve is some additional validation in the library that can adapt to the farmOS server on runtime. I think this is possible!
Pydantic has a cool feature for Dynamic model creation :
There are some occasions where the shape of a model is not known until runtime. For this pydantic provides the create_model method to allow models to be created on the fly.
This is really great because the dynamic model can extend a base model. So..
log
model/farm.json
and create dynamic models for additional log/asset/etc record types.FarmOSLogClient
class @symbioquine has proposed could return any model that extends the base log
modellog_type
and validate with the specified dynamic modelNote, the dynamic create_model
method of Pydantic doesn't support JSON Schema, although there is an issue describing this and they are open to supporting that. I don't think it would be required for our use because we could check /farm.json
(even as is without JSON Schema) for additional log types, and generate dynamic models based on available fields.
These thoughts are pretty conceptual, but I think that it might work. Worst case, the base log
model could allow other fields and still facilitate sending/retrieving logs of any kind from the server, although with less support for validation.
@paul121 Really appreciate all the thoughts/feedback here!
The current implementation defaults to a "get all" action unless a
page
filter is provided. This likely became the default action as a result of not having other exposed pagination features... which does encourage unperformant usage patters. Seeing this example is really quite convincing.. a great example of iterators! +1How would the proposed API support retrieving multiple records by id in a single request - i.e. #14? A better approach would be to support this via the filters argument to the query_page, iterate_query_pages, iterate_query methods.
+1 +1 Great examples in the FAQ! Another benefit of the
filters
argument is that it could still allow supplying apage
filter as described above.
I think we should disallow controlling the pagination via the filters
argument since that would yield code that is coupled to how the D7 RestWS module does entity pagination. The limit
and page
arguments should have dedicated arguments separate from filters that affect which entities get returned.
For example;
page = farm.area.query_page(filters={'area_type': 'field'}, page_num=3, max_page_size=10)
Calls farmOS as;
http://farmos.local/taxonomy_term.json?bundle=farm_areas&limit=10&page=3
I'm still thinking on best ways to support Settings.. it would be great to avoid the second API method. I think think this would work just fine?
# Build settings object from env var or read from file settings = FarmOSOAuthClientSettings() async def create_oauth_client(**settings): ....
Seems like a good simplification as long as we document it well enough. My thought was that the method is very cheap to implement and provides a nice top-level place to put that documentation in the code...
Re: JSON Schema....
I think our general motivation for JSON Schema has been to support record validation in the client libraries. The motivation for this comes from different log types having different fields: (screenshot of
/farm.json
output, which documents simple schema specs, but not JSON Schema compliant)...
This is important because some farmOS servers could have the
soil_test
module disabled, and thus not support thefarm_soil_test
log type. Similarly, thefarm_quantity
module could be disabled and not addquantity
fields to the above log types. (correct me if I'm wrong @mstenta )
These are good points. I'll need to put a bit more thought into this part...
From my understanding, consumers will always be responsible for checking the supported record types on a farmOS server they connect to - which is different than "version compatibility" of the farmOS server.
I think you're right that there are several connected concerns here.
What we're trying to achieve is some additional validation in the library that can adapt to the farmOS server on runtime. I think this is possible!
Validation makes total sense, we can't/shouldn't get around the fact that some farmOS installations may support a sub/superset of the fields. Especially when writing data to farmOS, consumers of farmOS.py will necessarily be coupled to the abstract feature set of the specific farmOS installation(s) they are working with. e.g. As you said above, you can't write log records with quantities if the farm_quantity
module isn't available.
I'm mostly worried about consumers of farmOS.py not being able to predict the structure of the model objects they get back - or the structure of the objects they can create locally.
To make this a bit more concrete, as a consumer of data via farmOS.py I would hope I don't have to be defensive about the existence of the quantities
field on a FarmOSLog
object.
For example, instead of writing;
for quantity in getattr(log, 'quantities', []):
print(quantity)
I should be able to write;
for quantity in log.quantities:
print(quantity)
If the farm_quantity
module isn't available then the quantities
field should just be an empty list - which is a case I already have to deal with since not all logs would necessarily have quantities anyway.
These thoughts are pretty conceptual, but I think that it might work. Worst case, the base
log
model could allow other fields and still facilitate sending/retrieving logs of any kind from the server, although with less support for validation.
Agreed that we should try not to make farmOS.py a barrier to working with any of the data in a reasonably canonical farmOS installation.
I'm mostly worried about consumers of farmOS.py not being able to predict the structure of the model objects they get back - or the structure of the objects they can create locally.
Yea, this is exactly what I'm dealing with in my farmLog
module, and what JSON Schema goes a long ways in solving (in my context at least). Generally speaking, I think @paul121 is right that this is not so much a version compatibility issue, but more a dependency issue: specifically, how do we communicate dependencies and their required configurations across the client-server boundary? The fact that dependencies are going to bleed across that boundary seems unavoidable, and certainly produces its own issues, particularly if one of your goals is to abstract away the network protocols by which those dependencies are communicated.
I guess one way in which we're solving this with Field Kit is that we're mirroring the modularity of the server with similar modularity in the client. As we move further towards the Field Module architecture, this means that Field Kit itself makes fewer and fewer assumptions about the server's configuration, and instead each field module becomes more responsible for declaring its own dependencies explicitly. And again, I'm also allowing for some of that modularity by structuring farmLog
to take a schema as a runtime dependency, which is a slightly different way of dealing with the modular dependency issue.
But that's application code. I'm not sure if it would be possible to take a similar modular approach with the farmOS.py library, but perhaps it's a thought?
Also, I'm still not sure about this:
I like where you're going with this, but to achieve what I've described above, the dependency would need to be a build-time dependency. i.e. model types or code could be generated from various schema at build-time, but unless I'm missing something, I don't think it would work at runtime without pushing version compatibility off on the consumers of farmOS.py.
It seems to me that there will always be some details (eg, configuration, modules, dependencies, etc) that need to be shared knowledge in a server-client architecture, and that cannot always be available at build-time. But I also don't see runtime dependencies as a barrier to abstraction. Maybe I'm still missing something though?
@jgaehring Thanks for the inputs!
But that's application code. I'm not sure if it would be possible to take a similar modular approach with the farmOS.py library, but perhaps it's a thought?
Yes, I think this is a key point. Field Kit can safely make certain assumptions and provide behavior which effectively yield an implied interface. The contract for that interface is: "If your farmOS installation behaves like X, these N features will be available in farmOS.app". Users of farmOS.app then presumably dynamically adapt their behavior based on the available features - and (hopefully) that feels natural to them.
An API like farmOS.py is most useful though if it can reduce the amount of variables consumers have to deal with - and where that's not possible, provide predicable ways to deal with those variables.
It seems to me that there will always be some details (e.g., configuration, modules, dependencies, etc) that need to be shared knowledge in a server-client architecture, and that cannot always be available at build-time. But I also don't see runtime dependencies as a barrier to abstraction. Maybe I'm still missing something though?
Certain kinds of runtime dependencies are a barrier to abstraction. For example, I'm skeptical that it would be wise to generate the actual model Python classes at runtime since that would create the issue I described above where one has to be defensive when consuming farmOS.py even for the most basic attribute access cases.
Whether given modules are actually enabled and whether the corresponding data can ever be returned or saved from/to a particular farmOS installation is definitely only known at runtime.
The limit and page arguments should have dedicated arguments separate from filters that affect which entities get returned.
page = farm.area.query_page(filters={'area_type': 'field'}, page_num=3, max_page_size=10
Yea, this is nice. More explicit π
specifically, how do we communicate dependencies and their required configurations across the client-server boundary?
@symbioquine @jgaehring I think we're finally reaching the nitty-gritty details of this, great discussion π
Field Kit is a good example of why these server "dependencies" should be passed through the client to the consumer:
Users of farmOS.app then presumably dynamically adapt their behavior based on the available features - and (hopefully) that feels natural to them.
In general, I think farmOS.py should operate similarly. But perhaps there are different use cases. The quantity
use case is a good example:
To make this a bit more concrete, as a consumer of data via farmOS.py I would hope I don't have to be defensive about the existence of the quantities field on a FarmOSLog object.
If the farm_quantity module isn't available then the quantities field should just be an empty list - which is a case I already have to deal with since not all logs would necessarily have quantities anyway.
For some use cases (I'm hesitant to say most) I think this behavior (always returning an empty list/adding other fields to logs) could be problematic. As a consumer, if you're using log quantities
, wouldn't that mean your software/app/use-case is at least semi-dependent on the farm_quantity
module being installed? And you should decide how to handle that, not the library? Perhaps you tell the user _"Error: farm_quantity
must be enabled to graph crop harvest data"_, OR if it isn't as critical, you can decide to parse all logs with an empty quantity
list if that is easier.
Also, I think its worth noting if a log type
has a quantity field, it returns an empty list []
if no quantity is provided. Adding an empty quantity []
to logs of all types seems to defeat the purpose of having log types
, as you would then want to set all other possible fields when returning records to the consumer?
Perhaps a good example is the farmOS-Aggregator which is designed to simply relay the records retrieved by farmOS.py. If farmOS.py added an empty quantity
to all logs, that might be problematic to consumers of the aggregator if they were to then try and "update" that log with a quantity value. (The aggregator might be somewhat of a unique example, but it also was a motivation for building farmOS.py)
I'm mostly worried about consumers of farmOS.py not being able to predict the structure of the model objects they get back - or the structure of the objects they can create locally.
This seems like a general challenge 3rd parties working with farmOS will have to work around. Most of the time, it can be expected that most of the core modules will be available. But thinking to the future, very custom modules could provide additional fields to farmOS records for specific use cases. The farmOS client libraries should still support these extensions.. but that might require consumers to be "defensive" about the structure of the models that are returned. If the library makes decisions that change the models the farmOS server is actually returning, isn't that a problem? π€
Regarding creating records, a quick test found that this request...
( creating a farm_soil_test
log with an illegal seed_source
property that is used on farm_seeding
logs)
new_log = farm.log.send({'name': "New test log.", 'type':'farm_soil_test', 'done':1, 'seed_source':'blah'})
gets this response from the server ... '406 Not Acceptable: Unknown data property field_farm_seed_source.'
(and farmOS.py returns nothing! Not good. The error response from server should at least be returned through to the consumer.) I think we all agree it would be great validation in the library could prevent a request like this being sent to the server.
Question: are there any use cases where a consumer cannot be defensive? Is that inconvenience worth manipulating the response from the server? Perhaps this could be a configuration for the library, to return "raw" data, or (for lack of a better term) "clean" data?
@symbioquine I must admit that I might be stuck in thinking about only a subset of use-cases for farmOS.py. In general I think of two use cases:
I'm glad you created the forum post to collect more thoughts on what is expected of the library! I might list my thoughts on use-cases there.
Wow @symbioquine this is great! And so dense with ideas and considerations haha! Great job getting into all the details @paul121 and @jgaehring! I can't think of much more to add. There may be a few little details here and there that might need minor correction/clarification, but I think the overall issues remain the same. For example: the quantity module is a hard dependency of many log types, so the quantity field will not be missing on some sites and there on others. BUT, the "equipment used" field and the new "data" field ARE added dynamically depending on whether or not their module is enabled. Now this in itself is up for debate and I may be swayed to make them not optional if that simplifies things... the data field is an easy one... the "equipment used" field would put a hard dependency on the Equipment module, however, which I'm less sure about. As you can see we could probably spin off on lots of different tangents with these thoughts. :-)
In general, I love all the goals you laid out @symbioquine - and figuring out the details/considerations to move us towards them, even if that is done iteratively, is worthwhile IMO.
And of course... we should also consider the changing API options in Drupal 8. We will probably be adding the GraphQL module, in addition to the core JSON:API module, which will open up new possibilities... it would be worth playing around with some of those things early in this process as well.
We have some funding to make improvements to farmOS.py and farmOS-aggregator over the next few months. I'm tracking these in a v2 milestone: https://github.com/farmOS/farmOS.py/milestone/1
This issue is a little dated now but still has lots of great info and ideas. Some of these things have already been implemented in farmOS/farmOS.py, but certainly not everything (see initial issue description for a nice summary). There a couple ideas from this issue I'd love to pick up in this next around of work:
I think we could close this issue but will leave that for you @symbioquine - I'd prefer to move forward in some smaller issues but want to make sure we don't dismiss any large topics you mentioned here
I've been digging into possible farmOS.py API structures and (relatively) "recent" improvements in Python's support for async co-routines as prompted by the discussion on https://github.com/symbioquine/farm-os-area-feature-proxy/issues/1.
I plan to use this issue to capture my thoughts and see if we can converge on a design for a next major version of this library that could support a wider range of use-cases more gracefully.
Design goals
API Sketching
Synchronous Usage
Simple login/password session client;
Yields;
OAuth2 client;
Asynchronous Usage
All API usage should be identical except that;
farmOS.create_*_client
methods are used to create the clientawait
is used for calling all methods which directly return a valueasync with
is used anywhere you would usewith
in synchronous usageasync for
is used when iterating over pages of results - or derivatives thereofAreas
By Id
Yields;
Pagination
or
Yields;
Iteration
Yields;
Creation
Yields;
Updates
Yields;
Deletion
or
Yields;
How to support sync/async in the same code base
The layers that do most of the heavy lifting get written in the async style, but have no concrete external dependencies themselves.
These classes get put into python files named
async_*.py
and the classes are named following the patternAsync*
.Then we generate the non-async versions from that with some code seen below. The top level of the farmOS library then provides static factory methods for wiring together the various permutations of the sync/async client layers.
Structure
Diagram .png files include embedded draw.io source.
Asynchronous
Synchronous
Asynchronous Sensors
Synchronous Sensors
FAQs
It seems like you are trying to "boil the ocean" with this issue... is all this tractable to tackle as a single "issue"?
Maybe not. I expect the API described here to take a lot of work to implement, document, and test thoroughly.
My purpose as mentioned above is to polish a vision for how the farmOS Python API should be. If we can align on that vision, a branch can be used to track the development against that vision until it can be released as a new major version.
Why expose pagination explicitly?
This gives consumers of the farmOS.py library the flexibility to handle large amounts of data in/from farmOS performantly. It would be really challenging or impossible to completely hide the pagination and still provide that flexibility.
Why are there no "get all" or "query all" methods which return collections, only pagination and iteration?
This would unnecessarily increase the "surface area" of the API while encouraging unperformant usage patterns that hold all entities/records in memory.
Consumers which require the entities/pages as collections can easily do so themselves;
Will those static factory methods on the top-level module of farmOS explode combinatorially as the library supports more kinds of transports, auth mechanisms, etc?
Possibly, but this also lets us be opinionated about the supported combinations and provide concrete documentation with embedded code examples for each one.
How would the proposed API support retrieving multiple records by id in a single request - i.e. https://github.com/farmOS/farmOS.py/issues/14?
Naively, it would seem that this should be supported by the
get_by_id
methods - e.g. onFarmOSTaxonomyClient
- however that would probably be a mistake since that would make the return type different depending whether a single or multiple ids are passed.A better approach would be to support this via the
filters
argument to thequery_page
,iterate_query_pages
,iterate_query
methods. This conveniently fits into both the proposed python API (which already return multiple records for those methods) and the existing Drupal entity "transport" API simply by adding a simple requirement for the handling of thefilters
argument;Thus the example from https://github.com/farmOS/farmOS.py/issues/14 would be supported by the proposed API as follows;
Calls farmOS as;
Conveniently, this also allows for other kinds of cool queries to execute efficiently from a transport API perspective;
Calls farmOS as;
References
async
/await
keywords