Rothamsted-Ecoinformatics / farm_rothamsted

Custom farmOS features for Rothamsted Research.
GNU General Public License v2.0
6 stars 1 forks source link

Experiment module: require matching plot numbers and plot ids for uploads #632

Closed aislinnpearson closed 6 months ago

aislinnpearson commented 6 months ago

Following a fairly serious upload error, where the plot number on the geoJSON didn't match the plot number on the variables, we'd like to Require that users must include both the plot number and plot id when uploading data to experiments (geoJSON's, plot variables, plus any future uploads).

That this occurs in generally simple human error but we'd like to build in the functionality to ensure it doesn't happen again (or at least makes that error less likely) as ultimately it means records and variables are allocated to the wrong plots (see below example).

Once this is complete, I need to add it to the documentation!

Plot number and plot ID incorrectly allocated in the csv vs. the geoJSON: image

This is what that looks like in FarmOS: image

With errors in record allocations to individual plots that look like this: image

aislinnpearson commented 6 months ago

Following the conversation with @paul121 there would need to be an extra piece of logic where the first upload (either plots or variables) sets the expectations for the second.

paul121 commented 6 months ago

@aislinnpearson I've added this validation requiring matching plot IDs when uploading plot geometries.

Using the Broadbalk data as a test I found the following "errors" for plot_ids in the geojson file:

paul121 commented 6 months ago

When (re)-uploading plot variables... I'm curious how we want to handle this validation. I don't think we can always require that the plot_id and plot_numbers match the existing values because there may be a situation when you need to correct the previously uploaded variables.

In fact, this may always be the case because when we initially create plots we set the plot_id equal to the plot_number, so everything is sequential from 1 to total number of plots. Unless an experiment plan uses sequential plot_id and plot_number, they will need to override these values on the first variable upload.

In this first pass I'll add an optional checkbox that IS checked by default: "Validate Plot IDs" - "Only uncheck this box if you are confident that you are uploading attributes for correct plot number and plot ID pairs." I think people will be careful enough and not abuse this. If they get an error message they will likely try and figure out the problem rather than turning off validation. What do you think @aislinnpearson ?

Screenshot from 2024-03-19 14-41-17

aislinnpearson commented 6 months ago

This looks great but can we please add the things discussed on our call:

  1. A button to reset the plot geometry
  2. Force the upload of variables first by hiding the upload button for geometry until the variables have been added
aislinnpearson commented 6 months ago

For Aislinn to add to the documentation:

After discussion with @paul121 we agreed conventions for this. A few key points that need to be added to the documentation:

  1. The unique plot identifier will always be the FarmOS asset number for the plot.
  2. The plot_number is always linked to the plot asset number (the user cannot edit this for traceability reasons)
  3. You can only assign a new plot_id by reuploading the variables (and for this reason the plot variables need to be uploaded before the geometry - even if it is only a list of plot_ids with a single variable)
  4. The geometry stays on the asset/ plot_number assigned to it if the variables are re-uploaded (we have added a button so this can be reassigned)
  5. None of the above will be traceable if you delete the plot all together (thereby deleting the FarmOS asset) so the system has been set up to prevent this (e.g. you can't delete a plot that has assets or logs associated with it, and only data admins have the delete permissions)

As an aside, we agreed that the ideal way to do this from a data perspective would be to upload the geoJSON with the variables, but this assumes additional GIS expertise which the user creating the variables (usually a statistician) might not have.

We agreed this issue may need refining as we go along, and this should come out of user testing, but for now the functionality is correct and reduces the potential for error

paul121 commented 6 months ago

This looks great but can we please add the things discussed on our call:

  1. A button to reset the plot geometry
  2. Force the upload of variables first by hiding the upload button for geometry until the variables have been added

Okay, I've added both of these! I will close this issue now

paul121 commented 6 months ago

Add description to "Validate Plot IDs" to the effect of "you will need to uncheck the first time you upload because plots IDs default to being sequential"

paul121 commented 6 months ago

Also, fully hide the "Upload geometries" buttons until variables are uploaded

paul121 commented 6 months ago

Edit error message to show the count of how many incorrect mismatches have been found

paul121 commented 6 months ago

AND make sure the geometry validation is actually validated on submission !

paul121 commented 6 months ago

I've made these above changes

aislinnpearson commented 6 months ago

Amazing. Thanks @paul121

Just to check, are the changes integrated with the current platform.sh site? If so I'll have a go at testing, including with errors.

paul121 commented 5 months ago

Sorry @aislinnpearson I had forgot to update the site - I've just pushed these changes now

aislinnpearson commented 5 months ago

HI @paul121,

So I tested this and you've done a great job. The functionality works perfectly, although there were three things I wanted to ask after testing:

  1. Can we have the 'Add variables' button on the plots tab prior to the first upload
  2. Can we suppress the 'validate plot id's' and 'reset plot ids' button on the first upload to reduce confusion?
  3. I wonder if we can improve the error message a little bit more, as well as the names and descriptions for 'Validate PlotIDs' and 'Reset plot geometries'. I am sure I can write something a bit more intuitive - just need to catch my brain on a good day!

Testing process:

Test 1: plot numbers and id's were correct in both the variables and the geometries Outcome: Correct. Everything uploaded perfectly.

Test 2: The plot id's in the geometry were incorrect. Outcome: Correct. I got the below error

image

Test 3: The plot id's in the variables were incorrect, but correct in the geoJSON Outcome: Correct. I got the below error. Once I reuploaded the correct variables, the geoJSON uploaded perfectly.

image

Test 4: The plot id's were incorrect in both files Outcome: Correct. The upload went fine (we cannot validate for this error). The error could be corrected by re-uploading the files and unchecking the 'validate plot id' box, but importantly the incorrect files are still attached to the experiment.

Test 5: The plot numbers are incorrect in the geoJSON Outcome: Correct. I got the below error message, which disappeared when I tried to upload the correct version of the geoJSON.

Test 6: The plot numbers are incorrect in the variables: _Outcome: I thought I would have outsmarted you here @paul121 but I could not. My test file includes two errors: one with two numbers swapped around, and one with an additional missing number. Even when I switched the 'validate plotids' off, it wouldn't allow me to upload the variables if I was missing a number (but it would allow me to upload if all the numbers were present, but the order wasn't consecutive).

image

Test 7: The plot numbers are incorrect in both files: Outcome: not applicable as this would hit the same issue as in test 6, and everything can be resolved from there

I was going to run some additional tests to see what happens if the plot number and plot ids were incorrect in (1) the GeoJSON, (2) the variables and (3) both files but this is obviously unnecessary as (1) would create the same error as test 2, while (2) and (3) would create the same error as in test 6.

The files I used for testing are here: FarmOS plot id validation tests.zip

paul121 commented 5 months ago

The functionality works perfectly, although there were three things I wanted to ask after testing:

Can we have the 'Add variables' button on the plots tab prior to the first upload Can we suppress the 'validate plot id's' and 'reset plot ids' button on the first upload to reduce confusion? I wonder if we can improve the error message a little bit more, as well as the names and descriptions for 'Validate PlotIDs' and 'Reset plot geometries'. I am sure I can write something a bit more intuitive - just need to catch my brain on a good day!

The challenge is that determining the "first upload" is challenging right now... I think there are a few things we could do to make this easier. I'll open a new issue with my ideas: #645

I thought I would have outsmarted you here @paul121 but I could not.

Thank you for testing so thoroughly!!