gbif-norway / helpdesk

Please submit your helpdesk request here (or send an email to helpdesk@gbif.no). We will also use this repo for documentation of node helpdesk cases.
GNU General Public License v3.0
3 stars 0 forks source link

Bionomia roundtripping meeting - 6 December #166

Closed rukayaj closed 6 months ago

rukayaj commented 8 months ago

We will discuss round tripping on 6 Dec (2pm - 6pm) with the Bionomia community. We have a meeting to talk about this internally next week, but some thoughts:

rukayaj commented 8 months ago

An example of a record with more data in Bionomia than we have published on GBIF: https://www.gbif.org/occurrence/1262651272

Note the Bionomia browser extension works well to actually display this on GBIF:

Screenshot 2023-10-26 at 09 48 39

In order to view all attributions on Bionomia which are not on GBIF, go to e.g. https://bionomia.net/dataset/26f5b360-8770-4d54-9c2d-397798a5e513 > Frictionless Data > Missing Attributions

rukayaj commented 8 months ago

We also should think about how we want curators to importing attributions back into MUSIT (actually round-tripping). It makes sense to me that they look over the records and do this, but then we need a discussion with the MUSIT team. BP has done it a few times, but without involvement from our side.

rukayaj commented 8 months ago

Notes from the meeting 2023-11-03

BP did all the manual checks from entries in Bionomia, entered corrections/ids manually for vascular plants as there were not many, and for the other collections we had a semi automated process of ingestion with the help of the collection management technical staff. We should describe this a bit more.

Different "flavours" we should be aware of:

  1. Attributions missing in our CMS
  2. Attributions incorrect in our CMS
  3. Attributions incorrect on Bionomia
  4. "False negatives" - attributions marked as incorrect on our CMS, often because they're wikidata qids rather than orcids

Feedback on Bionomia - We need a clearer description of what the frictionless data packages actually are, especially the differences between: Users unresolved, Unascribed, Missing attributions. This could be a simple pop-over or similar.

We also discussed problems with living people who don't have orcids. What happens when someone dies? Also there are additional info in wikidata which are not in orcid, so we always need both anyway.

Rukaya to: 1. make a set of editable slides for Eirik (or anyone) to present on 6 Dec to the Bionomia community, 2. to start work on an "alert" or "summary" type email service for our users, something like what is shown in the mockup:

Screenshot 2023-11-03 at 13 20 57

("Warning" -> "Congratulations" or similar)

BP to clarify on the importing/people process and MUSIT stuff.

Next meeting: After the BioDATA Advanced workshop in South Africa (week starting 27 Nov maybe).

dagendresen commented 8 months ago

Love the "Your GBIF dataset update" idea!! (Sorry I needed to swap over to another meeting halfway)

rukayaj commented 7 months ago

Hmm I just realised something. We actually have a two way round tripping problem šŸ˜…

Suppose BP downloads the frictionless data file which contains 100 name fixes. He decides that 30 of those are incorrect, so he discards them, and passes the file on to MUSIT. Next year he's going to have exactly the same problem and need to filter out exactly the same 30 incorrect attributions. It's going to become an annoying and unmanageable task, and we will turn people off it.

We actually need our curators to go onto Bionomia, and fix the attributions on there. Then when they're happy with them, they should download the frictionless data file and apply it straight onto to their Collections Management System without any additional vetting.

I think 99% of the attributions which our curators decide are incorrectly attributed on Bionomia will be genuine mistakes by an eager volunteer who was a bit too fast to click. And then they SHOULD get corrected in Bionomia.

rukayaj commented 7 months ago

Silly thought exercise: What happens when there's two collectors with the same name (and no orcids/qids) who are on the same collecting trip and collect the same specimen at the same time? I bet this has happened at least once in the universe to some J. Smiths or similar šŸ˜€

dagendresen commented 7 months ago

If we treat volunteer statements in Bionomia as annotations and mint/assign a PID to this annotation it becomes possible to discuss each volunteer statement. And somebody such as the collection curator can decide to discard a particular statement (by annotating the annotation as discarded). The roundtripping routine at the museum collection side will then be able to remember that the given volunteer statement (with a PID) already has a decision from the collection curator, and the same volunteer statement does not need to be presented to the collection curator all over again.

Furthermore, there are MANY more possible distributed external sources of information about a specimen that can be aggregated! Thus something like the Extended Digital Specimen could emerge to provide a larger more generic solution here. A solution that is much larger than only roundtripping volunteer statements made in Bionomia.

Thanks for engaging in a deeper more holistic solution process! :-)

dagendresen commented 7 months ago

Silly thought exercise: What happens when there's two collectors with the same name (and no orcids/qids) who are on the same collecting trip and collect the same specimen at the same time? I bet this has happened at least once in the universe to some J. Smiths or similar šŸ˜€

Normally two people have some different metadata even if they share the same name. And if some of that metadata is possible to find, one could distangle who is who and assign/create a new QID to formalise this discovery of who is who.

However, what if the two collectors have the VERY same metadata....? Born on the same day in the same city by the same mother. Grew up in the very same house, slept in the very same bed, studied at the very same university, looks exactly the same, etc! And nobody knows how to separate the two people by metadata. How will we then ever be able to know which of them collected each specimen? :-D

When two collectors with the same name participate in the same collecting event, the new proposed TDWG TG on Wikidata-based Research Expeditions can contribute and be useful. If we create an end-point (with a PID) for aggregating descriptions of the research expedition, we can record the two people with the same name as participants of that PID identified expedition. And make it at least easier to entangle. One step at a time...

rukayaj commented 7 months ago

If we treat volunteer statements in Bionomia as annotations and mint/assign a PID to this annotation it becomes possible to discuss each volunteer statement. And somebody such as the collection curator can decide to discard a particular statement (by annotating the annotation as discarded). The roundtripping routine at the museum collection side will then be able to remember that the given volunteer statement (with a PID) already has a decision from the collection curator, and the same volunteer statement does not need to be presented to the collection curator all over again.

Yes that would work! It could also perhaps be done on date? Click here to download a list of all recently added Bionomia annotations (since Jan 2023).

I think I'll make it more of a manual process first time round and ask them to send me the frictionless data download and mark the corrections that they want added into MUSIT, and we can see how many people engage with it.

rukayaj commented 7 months ago

The MUSIT team would like the following for bulk imports:

Screenshot 2023-11-29 at 20 34 02

Tab separated.

rukayaj commented 7 months ago

Draft slides: https://docs.google.com/presentation/d/1CxRZEICXvR9Et07VAwivq6QDPb_7u16f0mPZG_g8v5Q/edit?usp=sharing

rukayaj commented 6 months ago

We had this meeting, and I sent out the email. I'll open a new issue for anything that comes of it.