gbif / model-tests

Exploration of sample models
2 stars 0 forks source link

Arctos: Identifications and citations #13

Open MortenHofft opened 2 years ago

MortenHofft commented 2 years ago

Again I just try to recreate https://arctos.database.museum/guid/DMNS:Mamm:11098 and scribble down notes as I go along

query {
  # the entity can have many IDs, so we need to ask for the entity through an entity identifiers table
  specimensIDs: allEntityIdentifiers(condition: {
    entityIdentifier: "https://arctos.database.museum/guid/DMNS:Mamm:11098"
  }) {
    # there should only be one entity with this ID
    nodes {
      entityId
      entityIdentifier
      entityIdentifierType
      specimen: entityByEntityId {
        entityId
        entityType

        # get identifications
        materialEntityByMaterialEntityId {
          materialEntityType
          identificationMaterialsByMaterialEntityId {
            totalCount
            nodes {
              identificationByIdentificationId {
                taxonIdentificationsByIdentificationId {
                  totalCount
                  nodes {
                    taxonByTaxonId {
                      scientificName
                      kingdom
                      phylum
                      class
                      order
                      family
                      subfamily
                      # what happened to genus?
                      genericName
                      specificEpithet
                      infraspecificEpithet
                      scientifcNameAuthorship
                      parentTaxonId # this should be linked in the graph as well
                      # The UI has a longer classification, but I assume that is just because they use a different taxonomy?
                    }
                  }
                }

                verbatimIdentification
                # vernacular name: Colorado chipmunk is not in the data
                dateIdentified
                identificationAgentRolesByIdentificationId {
                  totalCount
                  nodes {
                    agentId
                    identificationAgentRole
                    agentByAgentId {
                      preferredAgentName
                    }
                  }
                }

                identificationType
                identificationVerificationStatus
                identificationRemarks

                taxaFormula # value : A - I do not know what it means, but perhaps meaningful
                isAcceptedIdentification

                # Citations section. I'm surprised this is sitting on the Identification. I would have thought it was attached to the specimen
                identificationCitationsByIdentificationId {
                  nodes {
                    citationType
                    citationPageNumber
                    citationRemarks
                    # The citations reference the species name, but since they hangs of the Identification it must come from there.
                    referenceByReferenceId {
                      referenceType
                      referenceDoi # I do not know enough about this, but is DOIs the only thing that is used to link?
                      bibliographicCitation # The UI shows like "Bell et al. 2015" - but I assume that is just something that parse the full string?
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}
tucotuco commented 2 years ago

The correct term in Darwin Core for the genus level in a classification is genericName. That is in the taxon.csv file.

Arctos actually has a very full and sophisticated taxonomic classification backbone that rivals CoL. I did not try to reproduce it. I opted to map and build only one classification using the existing Darwin Core terms. I assumed that in practice the same machinery would be used to populate taxa as is currently done in the pipeline.

The parentTaxonID term should have a self-referential foreign key to taxonID, but in practice here it is not used because I opted to "publish" using the flat Darwin Core classification method. Again, I would expect GBIF to use the backbone for taxonomy, not the published classifications.

We were not given vernacular names. I would expect this to come from a backbone, not from a specimen data publisher.

The field taxaFormula is correctly mapped. The value A just means that there is a single scientificName (the one linked) and no other variation on that for the identification. Arctos uses taxaFormula to be able to link real (potentially multiple) taxa in an identification, to cover hybrids, uncertain identifications, multiple taxa in a single object, etc. without "polluting" taxon names.

Citations are one of the Common Models, they can be attached to anything. In this case I instantiated an IdentificationCitation because what is being cited is the specimen-as-example-of-taxon based on an Identification. Arctos models it internally this way as well.

I do not understand the issue for referenceDOI. What constraint are you referring to?

MortenHofft commented 2 years ago

referenceDOI

I do not understand the issue for referenceDOI. What constraint are you referring to?

I just mean that DOIs seem to be the only way one can share a link to a citation. Do papers always have a DOI and is that always something the publishers have? It might well be the case - I do not know. So with constraint I mean - you can only link a reference if you have a DOI.

Vernacular name I'm happy to leave it out. I'm just use to seeing it included from dwc.

Citations Ah nice - I thought they only applied to Identifiers based on https://github.com/timrobertson100/model-tests/blob/master/arctos/arctos.png

timrobertson100 commented 2 years ago

Again, I would expect GBIF to use the backbone for taxonomy, not the published classifications.

I am not sure I would. Firstly we need higher taxonomy from the source to be able to align to the backbone sensibly (e.g. variation in data, like common misspellings etc and disambiguating homonym) and secondly, I'd think a "if aligned to the COL/GBIF this would appear as ..." would appear as an annotation on a material page.

tucotuco commented 2 years ago

For our purposes now, a classification is provided. That classification include kingdom, so you could align it to the backbone if you chose to. We'll have to have deeper discussions about what to do with Taxonomy as we go forward. For now, it is just a placeholder in the model, and for the purpose of this case study just follows the publishing model of Darwin Core.

referenceDOI

I do not understand the issue for referenceDOI. What constraint are you referring to?

I just mean that DOIs seem to be the only way one can share a link to a citation. Do papers always have a DOI and is that always something the publishers have? It might well be the case - I do not know. So with constraint I mean - you can only link a reference if you have a DOI.

I hadn't viewed it as a constraint, I viewed it as a way to share the DOI if you have it. This is what Arctos has, so I added it to the References class for that purpose. There may well be a better way, but it hasn't been elaborated in any use case yet. We could replace referenceDOI with referenceURI to be more general. Happy to do that if there is agreement.

mdoering commented 2 years ago

The correct term in Darwin Core for the genus level in a classification is genericName. That is in the taxon.csv file.

It is exactly the other way around. genus is the classification while genericName is the genus part of the name. For accepted names both is the same, but for synonyms the accepted genus might differ from the genus part of the synonyms scientificName.

genus: The full scientific name of the genus in which the taxon is classified. genericName: The genus part of the scientificName without authorship.