BiologicalRecordsCentre / ABLE

Assessing ButterfLies in Europe project repository
2 stars 3 forks source link

Moth occurrence download includes training samples #454

Open JimBacon opened 2 years ago

JimBacon commented 2 years ago

Chris said:

I am doing some first analysis with the moth count data for SPRING. I downloaded the sample and occurrence data from the website for all data. It took me some time before I found out that for the following Sample ID's (which have occurrences) there is no sample available: 19001278, 19001280, 19001281, 19001282, 19001283.

David replied:

I’ve checked the database for this problem and find that these ‘missing’ samples are because the Samples are marked as ‘Training’. This is why they are excluded from the Samples download file. However, the component occurrences are not marked as Training in the associated Occurrences download file. I suspect this is a bug in the way the app submits data when in ‘Training’ mode. I will ask Karolis about this. In the meantime, can you update the Occurrences Download report to exclude records where the sample is marked as Training = True.

DavidRoy commented 2 years ago

@kazlauskis can you confirm how samples and occurrences are submitted when the app is in training mode

JimBacon commented 2 years ago

The occurrence download comes from the ElasticSearch occurrence index which does not contain the sample.training (a.k.a. trial) field, only occurrence.training. Short of adding the field to the index, which seems like the wrong solution, I see no way to filter out these occurrences.

What I can do is correct the data, if these are confirmed to be training records.

DavidRoy commented 2 years ago

Thanks for investigating. I think we should take two actions once Karolis confirms my suspicion about the current data submission approach for training mode

  1. Run a query to correct the data. Setting occurrence.training = True where samples.training = True
  2. Karolis to correct the data submission so that occurrence and samples are set with training=True when the app is in this mode
JimBacon commented 2 years ago

I've done a check to see if the issue is more widespread than just the moth recording. Across the whole of warehouse1, here are the counts of occurrences having training = false while their sample has training = true.

Website Survey Input form Count
EBMS EBMS 15 minute counts enter-app-record 2741
EBMS EBMS 15 minute counts mydata/samples/edit 561
EBMS EBMS 15 minute single species counts enter-app-record 10
EBMS EBMS 15 minute single species counts mydata/samples/edit 1
EBMS EBMS fixed moth trap 140
EBMS EBMS Transects ebms-input-data 3
EBMS EBMS Transects 312
FRDBI Advanced fungal record record/advanced 1864
iRecord Asian Hornet Watch enter-app-record 3
iRecord iRecord Butterflies 2
iRecord iRecord Import 47
National Plant Monitoring Scheme Indicator survey indicator-recording-form-2015 7
National Plant Monitoring Scheme Inventory survey inventory-recording-form-2015 9
National Plant Monitoring Scheme Wildflower survey wildflower-recording-form-2015 19

Obtained with the query


SELECT website_title, survey_title, s.input_form, count(*)
FROM cache_samples_functional s
JOIN cache_samples_nonfunctional snf ON snf.id = s.id
JOIN cache_occurrences_functional o ON o.sample_id = s.id
WHERE s.training = true AND o.training = false
GROUP BY website_title, survey_title, s.input_form
ORDER BY website_title, survey_title, s.input_form
JimBacon commented 2 years ago

If it is considered invalid for a training sample to have non-training occurrences then it would be proper for the warehouse to reject such a submission or mend it. This would prevent the issue recurring.

DavidRoy commented 2 years ago

@johnvanbreda might have a view on this, including the FRDBI data - he built their site

johnvanbreda commented 2 years ago

The FRDBI issues are associated occurrences, typically hosts for fungi records. There was a bug in the code that meant that associated occurrences were not picking up the training flag. I've fixed this, applied the fix to FRDBI and updated the occurrences.

kazlauskis commented 2 years ago

Since a top sample-level training attribute was introduced we have stopped setting it up to child occurrences or sub-samples. We will fix this asap - created a new ticket for the app.