globalbioticinteractions / mcz

Configuration to index Museum of Comparative Zoology, Harvard University.
0 stars 0 forks source link

associatedOccurrences format #1

Open jhpoelen opened 3 years ago

jhpoelen commented 3 years ago

MCZ contains associatedOccurrences in the following format:

$ unzip -p mcz.zip occurrence.txt | cut -f30 | grep -P "collection_object_id" | grep "paras" | head
parasitically found on/in          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=5197872"> MCZ IZ ECH-8358</a>
parasitically found on/in          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=1563036"> MCZ Ich 47556</a>
brood parasitized by          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=3151006"> MCZ Orn 356498</a>
brood parasitized by          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=5259116"> MCZ Orn 364962</a>
brood parasitized by          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=3150971"> MCZ Orn 356528</a>
brood parasitized by          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=5256459"> MCZ Orn 364961</a>
brood parasitized by          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=3151012"> MCZ Orn 357077</a>
parasitically found on/in          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=3223947"> MCZ IZ CRI-117</a>
parasitically found on/in          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=2930518"> MCZ IZ CRU-8019</a>
parasitically found on/in          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=1446101"> MCZ Ich 722</a>

with semi-colons being separators for lists.

This is slightly different from the Arctos notation of (has parasite) [some occurrenceId], so GloBI does not yet understand how to link these records.

jhpoelen commented 3 years ago

More examples include:

$ unzip -p mcz.zip occurrence.txt | cut -f30 | grep -P "collection_object_id" | head
duplicate recataloged as          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=596800"> MCZ Herp R-84811</a>
bad duplicate of          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=3881801"> MCZ HerpOBS 1</a>
embryo of          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=4674447"> MCZ Herp A-152570</a>
from same lot as          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=666604"> MCZ Mamm 3186</a>; from same lot as          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=666606"> MCZ Mamm 3187</a>; from same lot as          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=666608"> MCZ Mamm 3188</a>; from same lot as          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=666610"> MCZ Mamm 3190</a>; from same lot as          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=666612"> MCZ Mamm 3192</a>; from same lot as          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=678406"> MCZ Mamm 3191</a>; from same lot as          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=730482"> MCZ Mamm 3189</a>
from same lot as          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=666610"> MCZ Mamm 3190</a>; from same lot as          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=666612"> MCZ Mamm 3192</a>; from same lot as          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=678406"> MCZ Mamm 3191</a>; from same lot as          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=730482"> MCZ Mamm 3189</a>
from same lot as          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=666612"> MCZ Mamm 3192</a>; from same lot as          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=678406"> MCZ Mamm 3191</a>
split into          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=3354186"> MCZ Mamm 6085</a>
cloned from record          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=677308"> MCZ Mamm 61712</a>
parent of          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=757389"> MCZ Mamm 35744</a>
cloned from record          <a href="http://mczbase.mcz.harvard.edu/SpecimenDetail.cfm?collection_object_id=677308"> MCZ Mamm 61712</a>
jhpoelen commented 3 years ago

After making various changes to the way GloBI links occurrences across, and within, datasets, the following sample was extracted from elton interactions globalbioticinteractions/mcz results:

argumentTypeId sourceOccurrenceId sourceCatalogNumber sourceCollectionCode sourceCollectionId sourceInstitutionCode sourceTaxonId sourceTaxonName sourceTaxonRank sourceTaxonPathIds sourceTaxonPath sourceTaxonPathNames sourceBodyPartId sourceBodyPartName sourceLifeStageId sourceLifeStageName sourceSexId sourceSexName interactionTypeId interactionTypeName targetOccurrenceId targetCatalogNumber targetCollectionCode targetCollectionId targetInstitutionCode targetTaxonId targetTaxonName targetTaxonRank targetTaxonPathIds targetTaxonPath targetTaxonPathNames targetBodyPartId targetBodyPartName targetLifeStageId targetLifeStageName targetSexId targetSexName basisOfRecordId basisOfRecordName http://rs.tdwg.org/dwc/terms/eventDate decimalLatitude decimalLongitude localityId localityName referenceDoi referenceUrl referenceCitation namespace citation archiveURI lastSeenAt contentHash eltonVersion
https://en.wiktionary.org/wiki/support MCZ:Mamm:61296 61296 Mamm   MCZ   Grampus griseus     Animalia | Chordata | Mammalia | Cetacea | Delphinidae | Grampus | Grampus griseus kingdom | phylum | class | order | family | genus | species           male http://purl.obolibrary.org/obo/RO_0008506 coOccursWith MCZ:Mamm:61297 61297 Mamm   MCZ   Grampus griseus     Animalia | Chordata | Mammalia | Cetacea | Delphinidae | Grampus | Grampus griseus kingdom | phylum | class | order | family | genus | species           male   PreservedSpecimen 1993-11-20T00:00:00Z 41.795942 -70.016129   Orleans, Skaket Beach   http://mczbase.mcz.harvard.edu/guid/MCZ:Mamm:61296 http://mczbase.mcz.harvard.edu/guid/MCZ:Mamm:61296 local Museum of Comparative Zoology, Harvard University - Version 162.259 http://digir.mcz.harvard.edu/ipt/archive.do?r=mczbase     0.3.5-SNAPSHOT

The claim supports a claim in which two Risso's dolphins (Grampus griseus, http://mczbase.mcz.harvard.edu/guid/MCZ:Mamm:61296 and http://mczbase.mcz.harvard.edu/guid/MCZ:Mamm:61297) co-occurred at Orleans, Skaket Beach on 1993-11-20 . On closer inspection, the relations document a mass stranding of dolphins at Cape Cod on the Massachusetts coast.