ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
61 stars 13 forks source link

identifications - flat (and GBIF) #7695

Closed dustymc closed 7 months ago

dustymc commented 7 months ago

From @ccicero

Typically, we have a field/prep ID to genus + species, and then we do a second ID to genus + species + subspecies when cataloging the material. So both of those IDs are acceptable, just that the latter is more 'accepted' than the former. According to the Arctos Handbook 'how to' documentation, " you can also assign any number above 1 to mean accepted, but not as much as 1." My specific questions are: 1) Would you recommended using 1 and 2 in these cases? 2) What gets exported to the VertNet IPT, is it just identification order = 1? That is what we'd want in data aggregators.
  1. Yes, absolutely. Anticipated precision aside, "we don't reject either of these two IDs, but we like one of them better" seems like a very useful thing to say for all sorts of reasons.
  2. I think what I'm doing now will not make DWC terribly happy, but maybe we can do better.

I am currently aggregating "accepted" IDs. From eg

Screenshot 2024-04-19 at 11 12 30

I get...


 select getFlatTaxonomy(collection_object_id) from flat where guid='DMNS:Inv:40450';

{
  "v_kingdom": "Animalia; Animalia",
  "v_full_taxon_name": "Biota, Animalia, Mollusca, Gastropoda, Caenogastropoda, Neogastropoda, Turbinelloidea, Columbariidae, Coluzea, Coluzea eastwoodae; Biota, Animalia, Mollusca, Gastropoda, Caenogastropoda, Neogastropoda, Turbinelloidea, Columbariidae, Columbarium, Columbarium eastwoodae",
  "v_phylclass": "Gastropoda; Gastropoda",
  "v_phylum": "Mollusca; Mollusca",
  "v_superorder": null,
  "v_phylorder": "Neogastropoda; Neogastropoda",
  "v_suborder": null,
  "v_superfamily": "Turbinelloidea; Turbinelloidea",
  "v_family": "Columbariidae; Columbariidae",
  "v_genus": "Columbarium; Coluzea",
  "v_species": "Columbarium eastwoodae; Coluzea eastwoodae",
  "v_subspecies": null,
  "v_author_text": null,
  "v_nomenclatural_code": null,
  "v_infraspecific_rank": null,
  "v_formatted_scientific_name": "<i>Columbarium eastwoodae</i>; <i>Coluzea eastwoodae</i>",
  "v_subfamily": null,
  "v_tribe": null,
  "v_subtribe": null,
  "v_taxon_rank": null,
  "v_scientificnameid": "urn:lsid:marinespecies.org:taxname:463651; urn:lsid:marinespecies.org:taxname:463613"
}

... so eg flat.genus gets the value "Columbarium; Coluzea" - a list of all "accepted" information.

I think that it's probably correct to include all the data-bits in all the information-pigeonholes, but I'm not sure that's actually what anyone wants.

Would it be better if I just flattened one "best" identification?

I feel like that's acceptable because anyone who wants the details probably isn't going to get them from the weird "traditional taxonomy" columns anyway, they'll just use the full structured data included with every record (example below).

At this moment, it seems like this would simplify some things (DWC, reports, etc.) without much detriment, and possibly it would even encourage better data management.

Thoughts?

arctosprod@arctos>> select previousidentifications from flat  where guid='DMNS:Inv:40450';
[
  {
    "idby": "Shirley Sanders",
    "made_date": "2023-10-05",
    "concept_label": null,
    "short_citation": null,
    "scientific_name": "Coluzea eastwoodae",
    "sensu_publication": null,
    "identification_taxa": [
      {
        "taxon": {
          "ftn": "Biota, Animalia, Mollusca, Gastropoda, Caenogastropoda, Neogastropoda, Turbinelloidea, Columbariidae, Coluzea, Coluzea eastwoodae",
          "name": "Coluzea eastwoodae",
          "ctrms": [
            {
              "psn": 1,
              "typ": "superdomain",
              "term": "Biota"
            },
            {
              "psn": 2,
              "typ": "kingdom",
              "term": "Animalia"
            },
            {
              "psn": 3,
              "typ": "phylum",
              "term": "Mollusca"
            },
            {
              "psn": 4,
              "typ": "class",
              "term": "Gastropoda"
            },
            {
              "psn": 5,
              "typ": "subclass",
              "term": "Caenogastropoda"
            },
            {
              "psn": 6,
              "typ": "order",
              "term": "Neogastropoda"
            },
            {
              "psn": 7,
              "typ": "superfamily",
              "term": "Turbinelloidea"
            },
            {
              "psn": 8,
              "typ": "family",
              "term": "Columbariidae"
            },
            {
              "psn": 9,
              "typ": "genus",
              "term": "Coluzea"
            },
            {
              "psn": 10,
              "typ": "species",
              "term": "Coluzea eastwoodae"
            }
          ],
          "nctrms": [
            {
              "typ": "AphiaID",
              "term": "463613"
            },
            {
              "typ": "authority",
              "term": "(Kilburn, 1971)"
            },
            {
              "typ": "citation",
              "term": "MolluscaBase eds. (2022). MolluscaBase. Coluzea eastwoodae (Kilburn, 1971). Accessed through: World Register of Marine Species at: https://www.marinespecies.org/aphia.php?p=taxdetails&id=463613 on 2022-04-23"
            },
            {
              "typ": "display_name",
              "term": "<i>Coluzea eastwoodae</i>"
            },
            {
              "typ": "isMarine",
              "term": "1"
            },
            {
              "typ": "lsid",
              "term": "urn:lsid:marinespecies.org:taxname:463613"
            },
            {
              "typ": "match_type",
              "term": "exact"
            },
            {
              "typ": "modified",
              "term": "2011-10-15T11:20:42.523Z"
            },
            {
              "typ": "parentNameUsageID",
              "term": "456389"
            },
            {
              "typ": "rank",
              "term": "Species"
            },
            {
              "typ": "scientificname",
              "term": "Coluzea eastwoodae"
            },
            {
              "typ": "scientific_name",
              "term": "Coluzea eastwoodae"
            },
            {
              "typ": "status",
              "term": "accepted"
            },
            {
              "typ": "taxonRankID",
              "term": "220"
            },
            {
              "typ": "url",
              "term": "https://www.marinespecies.org/aphia.php?p=taxdetails&id=463613"
            },
            {
              "typ": "valid_AphiaID",
              "term": "463613"
            },
            {
              "typ": "valid_authority",
              "term": "(Kilburn, 1971)"
            },
            {
              "typ": "valid_name",
              "term": "Coluzea eastwoodae"
            }
          ],
          "source": "WoRMS (via Arctos)",
          "classification_id": "https://arctos.database.museum/name/Coluzea eastwoodae#WoRMSviaArctos"
        },
        "taxon_id": "https://arctos.database.museum/name/Coluzea eastwoodae",
        "variable": "A"
      }
    ],
    "identification_order": 1,
    "identification_agents": [
      {
        "agent_name": "Shirley Sanders",
        "agent_identifier": "https://arctos.database.museum/agent/21313528",
        "identifier_order": 1
      }
    ],
    "identification_remarks": "Preferred name per WoRMS.",
    "identification_attributes": [
      {
        "agent_name": "Shirley Sanders",
        "attribute_type": "nature of identification",
        "attribute_units": null,
        "attribute_value": "revised taxonomy",
        "determined_date": "2023-10-05",
        "agent_identifier": "https://arctos.database.museum/agent/21313528",
        "attribute_remark": null,
        "determination_method": null
      }
    ]
  },
  {
    "idby": "unknown",
    "made_date": null,
    "concept_label": null,
    "short_citation": null,
    "scientific_name": "Columbarium eastwoodae",
    "sensu_publication": null,
    "identification_taxa": [
      {
        "taxon": {
          "ftn": "Biota, Animalia, Mollusca, Gastropoda, Caenogastropoda, Neogastropoda, Turbinelloidea, Columbariidae, Columbarium, Columbarium eastwoodae",
          "name": "Columbarium eastwoodae",
          "ctrms": [
            {
              "psn": 1,
              "typ": "superdomain",
              "term": "Biota"
            },
            {
              "psn": 2,
              "typ": "kingdom",
              "term": "Animalia"
            },
            {
              "psn": 3,
              "typ": "phylum",
              "term": "Mollusca"
            },
            {
              "psn": 4,
              "typ": "class",
              "term": "Gastropoda"
            },
            {
              "psn": 5,
              "typ": "subclass",
              "term": "Caenogastropoda"
            },
            {
              "psn": 6,
              "typ": "order",
              "term": "Neogastropoda"
            },
            {
              "psn": 7,
              "typ": "superfamily",
              "term": "Turbinelloidea"
            },
            {
              "psn": 8,
              "typ": "family",
              "term": "Columbariidae"
            },
            {
              "psn": 9,
              "typ": "genus",
              "term": "Columbarium"
            },
            {
              "psn": 10,
              "typ": "species",
              "term": "Columbarium eastwoodae"
            }
          ],
          "nctrms": [
            {
              "typ": "AphiaID",
              "term": "463651"
            },
            {
              "typ": "authority",
              "term": "Kilburn, 1971"
            },
            {
              "typ": "citation",
              "term": "MolluscaBase eds. (2022). MolluscaBase. Columbarium eastwoodae Kilburn, 1971. Accessed through: World Register of Marine Species at: https://www.marinespecies.org/aphia.php?p=taxdetails&id=463651 on 2022-04-23"
            },
            {
              "typ": "display_name",
              "term": "<i>Columbarium eastwoodae</i>"
            },
            {
              "typ": "isMarine",
              "term": "1"
            },
            {
              "typ": "lsid",
              "term": "urn:lsid:marinespecies.org:taxname:463651"
            },
            {
              "typ": "match_type",
              "term": "exact"
            },
            {
              "typ": "modified",
              "term": "2011-10-15T11:20:42.523Z"
            },
            {
              "typ": "parentNameUsageID",
              "term": "196943"
            },
            {
              "typ": "rank",
              "term": "Species"
            },
            {
              "typ": "scientificname",
              "term": "Columbarium eastwoodae"
            },
            {
              "typ": "scientific_name",
              "term": "Columbarium eastwoodae"
            },
            {
              "typ": "status",
              "term": "unaccepted"
            },
            {
              "typ": "taxonRankID",
              "term": "220"
            },
            {
              "typ": "unacceptreason",
              "term": "original combination"
            },
            {
              "typ": "url",
              "term": "https://www.marinespecies.org/aphia.php?p=taxdetails&id=463651"
            },
            {
              "typ": "valid_AphiaID",
              "term": "463613"
            },
            {
              "typ": "valid_authority",
              "term": "(Kilburn, 1971)"
            },
            {
              "typ": "valid_name",
              "term": "Coluzea eastwoodae"
            }
          ],
          "source": "WoRMS (via Arctos)",
          "classification_id": "https://arctos.database.museum/name/Columbarium eastwoodae#WoRMSviaArctos"
        },
        "taxon_id": "https://arctos.database.museum/name/Columbarium eastwoodae",
        "variable": "A"
      }
    ],
    "identification_order": 2,
    "identification_agents": [
      {
        "agent_name": "unknown",
        "agent_identifier": "https://arctos.database.museum/agent/0",
        "identifier_order": 1
      }
    ],
    "identification_remarks": "Legacy ID from dealer label (syn).",
    "identification_attributes": null
  }
]

Also there seems to be a bug (I think from the latest DB upgrade) so the data in Arctos are inconsistent. I'll make sure to update anything that needs it when we get this worked out.

Jegelewicz commented 7 months ago

AWG today - Use order 1 for flat/aggregators (concatenate if more than one).

dustymc commented 7 months ago

AWG meeting

(Do we need a report of records which don't have at least 1 1?)

dustymc commented 7 months ago

Records with no identification_order=1 ID:

temp_no_one.csv.zip

Summary:


 guid_prefix | count 
-------------+-------
 UAM:Arc     |   111
 UAM:Inv     |     1
 MMNH:Mamm   |     5
 NMMNH:Paleo |     5
 UAM:Herb    |     1
 MSB:Fish    |    10
 UMZM:Egg    |     2
 MSB:Herp    |    22
 CHAS:Teach  |     7
 MVZ:Fish    |     1
 BYU:Edu     |    32
 CHAS:Herb   |     3
 MVZ:Bird    |    40
 MSB:Para    |     2

Ping

@msbparasites @campmlc @StefanieBond @jtgiermakowski @barke042 @jandreslopez @adhornsby @sjshirar @ccicero @mkoo @droberts49 @camwebb @wellerjes @Nicole-Ridgwell-NMMNHS

adhornsby commented 7 months ago

UMZM & MMNH cleaned up.

campmlc commented 7 months ago

Sorry to miss the discussion today. So what is needed for these with ID rankings with no "1" value? I thought I could search on all MSB records and use Tools -> Manage Identifications to fix in bulk, but that does not provide a means of converting existing identification order to anything other than "0". Can we add the option of converting a list of records to ID value = some other numeric value?

dustymc commented 7 months ago

Taxonomy-flattener/cacher rebuilt per https://github.com/ArctosDB/arctos/issues/7695#issuecomment-2077992235, I've got it running on UAM:Ento without kingdom. Ping me if I need to prioritize anything else, otherwise everything should catch up in a couple weeks.

DerekSikes commented 7 months ago

"should catch up in a couple of weeks" - shouldn't all Arctos users be alerted that any searches involving higher taxonomy could be unreliable for this week (in case any reports have been based on such searches, or loans, etc) and a couple more weeks? Seems like a banner alert or something is in order? ack!

ccicero commented 7 months ago

@dustymc MVZ:Bird and MVZ:Fish records in your list fixed. Can you please double check that I didn't miss any, i.e., all MVZ records should have one ID order = 1, and I don't expect that there should be any records with more than one ID order = 1 but please check. Thanks!

dustymc commented 7 months ago

@ccicero I think this form is safe-ish for writeSQL:

select 
guid,
guid_prefix
from
flat
left outer join identification on flat.collection_object_id=identification.collection_object_id and identification_order=1
where identification.identification_id is null
order by guid

There are no MVZ, you're good, thanks!

wellerjes commented 7 months ago

Fixed for CHAS records

ccicero commented 7 months ago

@dustymc thanks!

Nicole-Ridgwell-NMMNHS commented 7 months ago

Fixed for NMMNH:Paleo records

mkoo commented 7 months ago

Completed

camwebb commented 6 months ago

UAM:Herb:51396 fixed