Open paul121 opened 4 years ago
Something else I realized regarding "cross referencing records" is that this could be accomplished with a custom query to the DB. Right now we are slightly limited by the RESTws API, but GraphQL might be an alternative solution down the road.... each "tracker" could provide a GraphQL query, run by farmOS.py, that returns the relational & computed fields from the DB
Very interesting idea! I like the re-usability aspect of it.
Worth noting (@paul121 and I just discussed this yesterday, so more for other readers) farmOS has a concept of "metrics" that modules can provide, which include things like total field acreage, animal head counts, etc. I think those will be a good test case to try in the community aggregator.
I'm still a bit on the fence over whether or not this information should actually be cached in the aggregator database itself, though. Or if that decision should really be left up to the downstream system that's using the aggregator. Caching is an added layer of complexity that needs to be understood and supported with more helper code, and is guaranteed to cause frustration and confusion to someone (just the nature of caching haha).
If the reason for doing this is for performance, then I'd say it would be better to wait until performance is actually an issue before we add complexity. Or take the stance that it's a downstream decision. I think my preference would be to keep the Aggregator itself as thin as possible, but that might change/evolve as we have more real-world deployments and use-cases. Another possibility would be to maintain some standalone plugin libraries that downstream aggregator users could use to build these kinds of decisions into their own apps in standardized/reusable ways. So for example: the caching layer could be built in the community aggregator app, and perhaps done in a standalone library (python or node.js) that could be reused, if that makes sense.
I've been wanting to brainstorm the "aggregating of records" piece for a while now. Discussing the crowdsource aggregator in farmOS/farmOS#206 got me thinking, so I took the chance and kept going...
Right now the aggregator allows us to push and pull records via the Aggregator API to multiple farmOS servers. This is great, but there seems to be a need to "track" certain types of records across multiple farms. There is also a need to "cache" a subset of the data from farmOS servers so that servers are not constantly supplying data. It would be great if we could do this in a reusable way that doesn't require custom modifications to the farmOS-Aggregator instance.
I'm proposing a way of creating reusable "trackers" for the aggregator. (Note: "Tracker" is the best term I could think of - I think it could be improved!)
As an example: If we want to aggregate "First Planting Dates", we could configure the aggregator backend to cache all seeding logs in the db - this could be saved in a
tracking
table, with a name "first planting tracker". Then, in the UI, there could be a view generated for eachtracker
in thetracking
table. The views could visualize this data in different ways (list the records, map geographically, graphs, averages, etc...) depending on the type of record. A different community might aggregate potato harvests in a similar way - they might configure the aggregator to cache allharvest logs
forpotato crops
, then visualize harvest quantities, harvest dates, harvest photos, etc... a simple tracker could even just save the "Number of Compost Piles" or "Number of Animals" without any additional data.I think
areas
andassets
could be aggregated similarly. A rule could be added to cache allfield areas
to get total acreage, or evengreenhouse areas
and get sq footage of greenhouse space. Caching farmassets
might provide animal head counts, number of crops grown, type/quantity of farm equipment, etc...Another thought: I think a "tracker" could be configured to cross reference records. An example (similar to produce quality study): track spinach crops & field history / growing practices. The tracker might cross reference a
planting asset
and thefield area
it was grown in: The rule would cache allseeding logs
andharvest logs
created for aplanting asset
ofcrop == spinach
, and cache allactivity logs
andinput logs
for the field it was grown in. This could be saved in one custom object as a row in the db table: (Good use case for JSONB? :D)A cool thing about this approach: each "tracker" could be defined in its own python file, and imported into the aggregator on startup. This means a "tracker" could be shared with other aggregators by just sharing the python file. This file would basically just define the "rules" that would cache records from the farmOS server. Here, the aggregator cron job might call the
get_farm_data
method of the Tracker on a regular schedule, and save the returned data in the Tracker db table with thefarm id
- a simple file for Planting Dates might look like the following:I drafted a simple one for "Number of Compost Piles" and a more complicated one for the "Spinach Study" (only proof of concept!) https://gist.github.com/paul121/463fdd02deec767ce5a1374c3e17c303
Each tracker could be configured to keep the most recent data set from a farmOS server, or track data over time.
Alternatively, the Aggregator could keep more general "cache" tables for
logs
,areas
andassets
. Records that are used in a "Tracker" could be saved to the Aggregator's general cache of all farm records, eliminating duplication of cached records. For this, instead of saving all the record data (like above) a "Tracker" would only save an obj with IDs linking to the cached records.