NHMDenmark / Mass-Digitizer

Common repo for the DaSSCo team
Apache License 2.0
1 stars 0 forks source link

Digi App taxonomic spine documentation should have more detail #432

Closed jlegind closed 8 months ago

jlegind commented 9 months ago

What is the issue ?

The process for compiling the Digi App taxonomy is not sufficiently described. There are a few notes here and there in the N:\SCI-SNM-DigitalCollections\DaSSCo\Digi App directories. We should have something more comprehensive.

Detailed description of the issue.

The creation of the taxonomic backbone is not trivial and deserves its own guidelines. Let us begin with the Herbarium taxonomy since it applies generally. The taxonomies contributing to the spine are :

Why is it needed/relevant ?

The taxonomic spine provides all the possible names to choose from and is vital to the digitization effort.

Estimate level of effort required.

Difficult

What is the expected acceptable result.

A comprehensive list of names that can be discovered in the Danish collections. %100 coverage is not realistic but we can get close.

How to approach it?

There needs to be a discussion of which taxonomic sources should have priority. I suggest that locally used taxonomies need to have high priority since these names are likely to be used the most. Most of the work on the current operational taxonomy was done in a Postgres database and the ease and speed of this tool was satisfactory.

What could be the challenges ?

Cleaning the source spines and the final product. There will be duplicates and they need to be cleaned out.

Is there a potential risk to this.

We could end up with a taxonomy where certain names will be missing. Users might be annoyed that their favorite names are not there or that the accepted name does not match their evaluation. After all taxonomy is a battleground.

What documentation required?

A recipe like document describing how a taxonomy is built from multiple sources.

PipBrewer commented 8 months ago

As mentioned in one-to-one meeting with JKL and BS, we are not building a complete taxon spine in one go. We are approaching things based on the next collection to be digitised. Ahead of time, we consult with collection manager and curators if there is a complete list of taxa in that collection. If not, where/how could we get a list that would include almost all. We then gather that and import into Specify. Workbench helps us merge duplicates. Some semantic duplicates aren't picked up and FS wrote a script to deal with them after they have been imported. We then extract that and use that in the Digi App. If we need to add authors, years etc we can use the GBIF API.

PipBrewer commented 8 months ago

Based on information contained in above comment, @bhsi-snm can you advise what else is needed for this ticket or whether it can be closed?

bhsi-snm commented 8 months ago

@PipBrewer, I think it would be nice to have a document referred here describing the procedure for building the spine(atleast the part we import into digiapp) for each collection, so that it is document for reference or if incase we want to do it from scratch or do an update. From above, it sounds like there is a procedure which involves various steps and it might a good idea to mention somewhere saying when this is relevant, for example, updating the spine etc...

PipBrewer commented 8 months ago

@bhsi-snm You mean including how to get it out of Specify (you mention updating)? That may involve Fedor.

bhsi-snm commented 8 months ago

Sure, I don't think we need to include specify part of it just what we need to do it on our end on DigiApp and steps where we just mention Fedor will do the needful..?

bhsi-snm commented 8 months ago

okie then we can just close it 👍