NHMDenmark / DaSSCo-Transcription

Work on transcription of specimen data from images as part of mass digitisation workflows and pipelines
0 stars 0 forks source link

Transcription v1 workflows #18

Open PipBrewer opened 1 week ago

PipBrewer commented 1 week ago

I've sketched out the preparation and transcription workflow for version 1.

Image

Image

@joaquimsantos1978 and @AstridBVW Can you have a look and let me know if you spot any mistakes/issues?

I need to know what the stages are for georeferencing so I can do the post-processing workflow. @AstridBVW would you be able to advise on this? I am assuming that after everything is transcribed, we search for all locations records created as part of this workflow (we should think about what goes in the audit log of Specify when we absorb data from the transcription platform), that have for validation marked. We export the data... and then?

AstridBVW commented 1 week ago

@PipBrewer I am uncertain whether or not a project will be available if all the specimens have been transcribed. @joaquimsantos1978 Do the projects close automatically or is this process manually performed by the project administrator?

About georeferencing, I was under the impression that it would be the collection managers/curators that would perform this task. They could do this directly in Specify either via the GeoLocate plugin or entering lat/long manually. They would be able to find the localities that need to be validated through the query (by searching for the status "Not validated", they can also add the source "DaSSCo transcription", these are both part of the UI adaptations).

joaquimsantos1978 commented 1 week ago

@AstridBVW Projects remain listed on the projects page, with the progress bar at 100%. The landing page only shows projects with progress under 100%. Keep in mind that despite the progress is under 100%, it might not be possible for an user to contribute if the user has already submitted all the possible fields for their user level

AstridBVW commented 1 week ago

@joaquimsantos1978 @PipBrewer Ok, then maybe the step "Are there any specimens left to transcribe"-Yes/No in the Transcribing workflow is not relevant? If we assume that users will always access projects from the landing page, they will only ever have access to projects where there are still fields left to be transcribed that match their user level. So the answer will never be "No", because if the answer is No, the project will not be on the landing page.

PipBrewer commented 2 days ago

@AstridBVW The idea is to export them and run a script to georeference them similar I think to something Zsuzs and Hans from NHMA have done in the past - then check them and re-import to Specify. The CMs can then check them.

The tool used previously by Zsuzs and Hans is Geolocate. This is apparently easy to use and can do things in bulk. However, it is necessary to go through and validate the records afterwards. We should see how much validation is required. The first pass for obvious mistakes could be done by us and then the records pushed to CMs for final validation (either before import to Specify or after?).

Joaquim recommends using Geopick. However, this system seems a lot more manual and perhaps not suitable for mass digitisation. It is worth testing it though.