globalbioticinteractions / mcz

Configuration to index Museum of Comparative Zoology, Harvard University.
0 stars 0 forks source link

Create a name report #3

Open seltmann opened 2 years ago

seltmann commented 2 years ago

Automatically align the names from a list or dwc-A and align using a catalog (e.g., catalogOfLife).

For example, given this record from MZC:

id modified rightsHolder references institutionID datasetID institutionCode collectionCode ownerInstitutionCode basisOfRecord informationWithheld dynamicProperties occurrenceID catalogNumber recordNumber recordedBy recordedByID individualCount sex lifeStage georeferenceVerificationStatus occurrenceStatus preparations disposition associatedMedia associatedOccurrences associatedSequences associatedTaxa otherCatalogNumbers occurrenceRemarks fieldNumber eventDate startDayOfYear year month day verbatimEventDate habitat samplingProtocol higherGeographyID higherGeography continent waterBody islandGroup island country countryCode stateProvince county locality verbatimLocality minimumElevationInMeters maximumElevationInMeters verbatimElevation minimumDepthInMeters maximumDepthInMeters locationRemarks decimalLatitude decimalLongitude geodeticDatum coordinateUncertaintyInMeters verbatimLatitude verbatimLongitude verbatimCoordinateSystem georeferencedBy georeferenceProtocol georeferenceSources georeferenceRemarks earliestEraOrLowestErathem latestEraOrHighestErathem earliestPeriodOrLowestSystem latestPeriodOrHighestSystem earliestEpochOrLowestSeries latestEpochOrHighestSeries earliestAgeOrLowestStage latestAgeOrHighestStage lithostratigraphicTerms group formation member bed identificationQualifier typeStatus identifiedBy identifiedByID dateIdentified identificationRemarks taxonID scientificNameID scientificName higherClassification kingdom phylum class order family genus genericName specificEpithet infraspecificEpithet taxonRank verbatimTaxonRank scientificNameAuthorship nomenclaturalCode
MCZ:Herp:R-147033 2021-12-02 21:30:28 President and Fellows of Harvard College http://mczbase.mcz.harvard.edu/guid/MCZ:Herp:R-147033 b4640710-8e03-11d8-b956-b8a03c50a862 MCZ Herp Museum of Comparative Zoology, Harvard University PreservedSpecimen {} MCZ:Herp:R-147033 R-147033 MCZ FS-F20823 Kenneth I. Miyata 1 unverified present whole animal (ethanol) unknown collector number=MCZ FS-F20823; muse location number=ZR147033 1975-08-12 224 1975 08 12 12/8/1975-12/8/1975 http://vocab.getty.edu/tgn/1001487 South America: Ecuador: Pichincha South America Ecuador EC Pichincha Ecuador: Pichincha: 41 km from Santo Domingo de los Colorados on rd to Quevedo Ecuador: Pichincha Prov.: 41 km from Santo Domingo de los Colorados on rd to Quevedo -0.53900391 -79.37394714 WGS84 3994 decimal degrees Elisa Bonaccorso MaNIS/HerpNET/ORNIS Georeferencing Guidelines Map SA17-3 (Santo Domingo de los Colorados) map 1:250,000. IGM Ecuador Catalog Anolis festae Animalia Chordata Reptilia Lepidosauromorpha Squamata Sauria Iguanidae Anolis festae Animalia Chordata Reptilia Squamata Iguanidae Anolis Anolis festae ICZN

Would align taxon name Anolis festae with Catalog of Life, resulting in output:

Anolis festae HAS_ACCEPTED_NAME COL:675NP Anolis festae species Biota | Animalia | Chordata | Reptilia | Squamata | Iguania | Dactyloidae | Anolis | Anolis festae COL:5T6MX | COL:N | COL:CH2 | COL:RP | COL:45C | COL:87BW7 | COL:8Y8 | COL:WQP | COL:675NP unranked | kingdom | phylum | class | order | superfamily | family | genus | species https://www.catalogueoflife.org/data/taxon/675NP

jhpoelen commented 2 years ago

@seltmann suggested to prepend the exact location of the row that the name was extracted from:

e.g.,

line:zip:hash://sha256/05b2680a060f4bccce78b001243b5b6a579fd7c67db10978d14bfac486d38ed5!/occurrence.txt!/L9 | Anolis festae HAS_ACCEPTED_NAME COL:675NP Anolis festae species Biota | Animalia | Chordata | Reptilia | Squamata | Iguania | Dactyloidae | Anolis | Anolis festae COL:5T6MX | COL:N | COL:CH2 | COL:RP | COL:45C | COL:87BW7 | COL:8Y8 | COL:WQP | COL:675NP unranked | kingdom | phylum | class | order | superfamily | family | genus | species https://www.catalogueoflife.org/data/taxon/675NP

jhpoelen commented 2 years ago

suggest to use ucsb-izc as a smaller example to try out the concept, then move to MCZ to get a sense for the performance of the method.