aodn / nrmn-application

A web application for collation, validation, and storage of all data obtained during surveys conducted by the NRMN
GNU General Public License v3.0
4 stars 3 forks source link

Job138 - Omitted rows in ingested sheet #1350

Open bpasquer opened 6 months ago

bpasquer commented 6 months ago

Opening an issue to further investigate problem raised in email by Lizzi (email 2024-04-10) regarding Job 138 (2022 RLS MPA TEMPLATE WITH INVERT SIZE AND ALGAE_SA_Amended JB.xlsx):

Lizzie: For Job 138. 20 rows were omitted from ingest. Many were for the species Austrolabrus maculatus but we are unsure why. Some of the “omitted” rows are actually true duplicates that are supposed to be summed on ingest. This still needs to be checked though as it looks like the totals of these duplicates are not being summed on ingest. User should receive a warning error for duplicate observations for same site/date/depth/method/block/species combos but if the user chooses to ignore this warning, data are to be summed. The original staged data and the ingested data are attached, with the highlighted rows in yellow needing investigating (green highlights are expected omissions) .

Bene: Job138 issue requires further investigation as we haven't yet identified clear patterns to explain why the highlighted rows weren't ingested. Regarding your assessment on duplicates processing, I can confirm is that the current software version does:     - flag duplicate rows

  • sum "true" duplicates in the endpoints

2022 RLS MPA TEMPLATE WITH INVERT SIZE AND ALGAE_SA_Amended JB_Missing highlighted.xlsx

bpasquer commented 5 months ago

Toni reported after testing the ingest of 2022 RLS MPA TEMPLATE WITH INVERT SIZE AND ALGAE_SA (email 2024-05-23) that the current software version does not have the issue. Because the original error cannot be replicated, investigating the cause is not possible. Nonetheless, I think it could be beneficial to verify whether sheets ingested with the same version of the software have also been affected. I propose we discuss the method for conducting this check (manual or automated;values/fields to check) once the list is established

bpasquer commented 4 months ago

A manual check was done on a couple of sheets ingested at the same period by Toni, to see if data has been omitted from the ingest After our discussions during the catch-up on 06/06/2024, it was acknowledged that implementing a solution to automate the check is complex. This would involve comparing rows in the original datasheet with records in the database and reporting on the differences. Given that we can assume the issue has been resolved, it is not considered worthwhile to invest further effort into this.

An enhancement to the ingested sheet summary report was proposed: to include the number of rows that were not ingested (i.e., the difference between staged and ingested rows).