Rothamsted-Ecoinformatics / farm_rothamsted

Custom farmOS features for Rothamsted Research.
GNU General Public License v2.0
6 stars 1 forks source link

Data Extraction: Integrate FarmOS with a Field Database Module #97

Open aislinnpearson opened 2 years ago

aislinnpearson commented 2 years ago

Following a discussion with the Farm about the reporting requirements from FarmOS, there are no standard queries that the farm runs when querying the farm data – the data needs to be queried regularly, but the actual data queries are not standardised. They vary widely depending on what the farm is doing and who is asking for what information. Ideally the farm would be able to run “access like queries” on the farm data, building their own queries depending on their needs or getting help from the data team to do so. The problem with this though is that FarmOS isn’t a database – instead the API pulls from a JSON/ java script type object.

One solution is for the farm team to push data from FarmOS into a database on a regular basis (every fortnight as a minimum). The farm can then query that database themselves or if the query is especially complex ask Alberto for help. Alberto mentioned the software Beekeeper Studio which would allow users to query the database without changing or accessing the actual content of the database, if for example we wanted someone on the DT team to automate this update instead of relying on the farm team. This field database would be separate from the archive database, as ideally the archive database would only holds only clean, archived data, while for reasons of computational efficiency the entire field database would replaced every time the farm team push to it. The current field season database would need to hold two years of data though as sometimes the farm team need to access these records into the following field season.

The reason we suggested every fortnight is because ideally the farm manager and the trials manager would check the data before it is pushed in to the database to catch any errors. This would address a separate Github issue (#79) of how we check the data before it is archived. This is essential as even Helen mentioned they are finding a lot of errors in the FarmOS data. But that in turn raised the point of how we correct errors in the database – as ideally they would be corrected first in FarmOS and then pushed/ overwritten in the database so the data is consistent in both places.

This is linked to a couple of other issues (neither of which are on Github as they are essentially internal issues):

1. How many years of data do we store in FarmOS? Currently it is just this field season but ideally it would at least be the last two years – more perhaps if Alberto’s visualisation tool pulls directly from the FarmOS API and not the archive database.

2. Current visualisation tool being build by Alberto: This is very useful for scientists, researchers and even the farm team when the want to view the data, but less useful for the complex queries that Helen wants to run. It will need to be updated though for the 2.x API and the experiments module. If we were to priorities one of the two tasks, it should probably be the database access but Helen asked if we could make this available too simply because it is an easier way of visualising the data entered.

aislinnpearson commented 2 years ago

In today's meeting it was agreed that this task would be assigned to Richard. We will start with a PostGRES database, and train the farm team in writing SQL queries. If this becomes too complex, we can add an Access front end, as that is what both Bruce and Helen are familiar with. A python scheduler will push changes regularly (daily?) as the database for querying needs to be kept up to date.

The database will be available for querying, but not editing. Edits (incorrect information) will need to be corrected via FarmOS to ensure the data is the same in both locations. FarmOS also has better functionality for recording changes. This issue is worth discussing further with Octavio Duarte, who has had similar issues working with the Real Food Campaign (part of OpenTEAM)