Closed jacobcook1995 closed 2 years ago
Think that this is an edge case due to "Formicidae" being recorded twice in the index, both as a parent of lower taxa and explicitly as "Formicidae 1". As far as I can see https://github.com/ImperialCollegeLondon/safedata_validator/blob/629fccb99bda005f92fd037f25f42909322bf4db/safedata_validator/zenodo.py#L575 has no means to handle this edge case
I made a start on extending taxon_index_to_html
to allow it to remove repeated entries (identical bar worksheet name). However, it's not a particularly straightforward thing to do. I'm also wondering if it would be cleaner to overwrite hierarchy taxa (those with None
as a worksheet name) at the taxa.py level?
That said, I've also found another duplication case (see Rhodoplanes here) where multiple unknown species are defined to the same genus level. This case can't be handled by changing how the index is generated in taxa.py (as both have non-None
worksheet names), so we might have to alter the tree level functions regardless.
All in all I think this is best paused until we get an opportunity to discuss it.
Closed by #23
Datasets uploaded using the develop branch seem duplicate branches of the taxa tree (i.e. here Formicidae is duplicated. As far as I can tell previous uploads do not show this duplication (e.g. the same (or very similar) dataset uploaded in 2019. This problem persists with datasets created using my current feature branch see, so is an outstanding issue.