Open JimBacon opened 2 years ago
@kazlauskis can you confirm how samples and occurrences are submitted when the app is in training mode
The occurrence download comes from the ElasticSearch occurrence index which does not contain the sample.training (a.k.a. trial) field, only occurrence.training. Short of adding the field to the index, which seems like the wrong solution, I see no way to filter out these occurrences.
What I can do is correct the data, if these are confirmed to be training records.
Thanks for investigating. I think we should take two actions once Karolis confirms my suspicion about the current data submission approach for training mode
I've done a check to see if the issue is more widespread than just the moth recording. Across the whole of warehouse1, here are the counts of occurrences having training = false while their sample has training = true.
Website | Survey | Input form | Count |
---|---|---|---|
EBMS | EBMS 15 minute counts | enter-app-record | 2741 |
EBMS | EBMS 15 minute counts | mydata/samples/edit | 561 |
EBMS | EBMS 15 minute single species counts | enter-app-record | 10 |
EBMS | EBMS 15 minute single species counts | mydata/samples/edit | 1 |
EBMS | EBMS fixed moth trap | 140 | |
EBMS | EBMS Transects | ebms-input-data | 3 |
EBMS | EBMS Transects | 312 | |
FRDBI | Advanced fungal record | record/advanced | 1864 |
iRecord | Asian Hornet Watch | enter-app-record | 3 |
iRecord | iRecord Butterflies | 2 | |
iRecord | iRecord Import | 47 | |
National Plant Monitoring Scheme | Indicator survey | indicator-recording-form-2015 | 7 |
National Plant Monitoring Scheme | Inventory survey | inventory-recording-form-2015 | 9 |
National Plant Monitoring Scheme | Wildflower survey | wildflower-recording-form-2015 | 19 |
Obtained with the query
SELECT website_title, survey_title, s.input_form, count(*)
FROM cache_samples_functional s
JOIN cache_samples_nonfunctional snf ON snf.id = s.id
JOIN cache_occurrences_functional o ON o.sample_id = s.id
WHERE s.training = true AND o.training = false
GROUP BY website_title, survey_title, s.input_form
ORDER BY website_title, survey_title, s.input_form
If it is considered invalid for a training sample to have non-training occurrences then it would be proper for the warehouse to reject such a submission or mend it. This would prevent the issue recurring.
@johnvanbreda might have a view on this, including the FRDBI data - he built their site
The FRDBI issues are associated occurrences, typically hosts for fungi records. There was a bug in the code that meant that associated occurrences were not picking up the training flag. I've fixed this, applied the fix to FRDBI and updated the occurrences.
Since a top sample-level training attribute was introduced we have stopped setting it up to child occurrences or sub-samples. We will fix this asap - created a new ticket for the app.
Chris said:
David replied: