ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Chronometric Age Extension for Darwin Core #3958

Closed Jegelewicz closed 2 years ago

Jegelewicz commented 3 years ago

@Jegelewicz @dustymc @lkvoong This collection caught my eye given all of the work that has gone into the new and improved ChronometricAge extension to DwC on the IPT. https://tools.gbif.org/dwca-validator/extension.do?id=http://rs.tdwg.org/chrono/terms/ChronometricAge

If UAF or any other ZooArch or paleo collection wishes to use the extension we should probably know that before data starts moving into Arctos for two reasons: (1) it may require some specific fields in Arctos, and (2) we will probably need to get @dustymc to generate a new SQL table that contains the relevant data so that we can map it to the extension when we prepare to publish.

Originally posted by @dbloom in https://github.com/ArctosDB/new-collections/issues/358#issuecomment-926090389

Nicole-Ridgwell-NMMNHS commented 3 years ago

Do we want to generalize the radiometric date attribute to chronometric age?

Jegelewicz commented 3 years ago

Will we lose information (method)? or can we now put something in the attribute method for that? (I have no idea what I am talking about!)

Nicole-Ridgwell-NMMNHS commented 3 years ago

I think we could map the attribute method to chronometricAgeProtocol, but we would be leaving out some of the fields in the extension such as uncertainly, conversion protocol, etc. We should find out if we have collections that would want to use those other fields. For NMMNH Paleo, we don't use that field a lot and I'm fine just linking the publication and the just sending the basic attribute information to GBIF. Other collections might feel differently.

dbloom commented 3 years ago

Hey all, I raised this point because of the UAF ZooArch collection and other non-Arctos collections I've helped to publish with the extension recently. Like any extension, it is not required that you use it at all, nor is it required that you use every field available - you use the fields that help you to present your data to the world (and the portals) in the cleanest possible way. It's just an option for current and future collections, should those data be present. It is also possible to publish much of these data via dynamicProperties and other fields, if preferred. No rush on it in any case.

On Thu, Sep 23, 2021 at 2:15 PM Nicole-Ridgwell-NMMNHS < @.***> wrote:

I think we could map the attribute method to chronometricAgeProtocol, but we would be leaving out some of the fields in the extension such as uncertainly, conversion protocol, etc. We should find out if we have collections that would want to use those other fields. For NMMNH Paleo, we don't use that field a lot and I'm fine just linking the publication and the just sending the basic attribute information to GBIF. Other collections might feel differently.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3958#issuecomment-926166992, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHGC36FKN4Y7VXM4PFMZ33UDOKGFANCNFSM5EUPAC4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Jegelewicz commented 2 years ago

@dbloom just wan to make sure that these are the fields we are looking at for this - https://tools.gbif.org/dwca-validator/extension.do?id=http://rs.tdwg.org/chrono/terms/ChronometricAge

tucotuco commented 2 years ago

I can confirm that is the most up-to-date GBIF extension. The Quick Reference Guide (https://tdwg.github.io/chrono/terms/) is part of the Darwin Core standard.

On Mon, Nov 8, 2021 at 5:53 PM Teresa Mayfield-Meyer < @.***> wrote:

@dbloom https://github.com/dbloom just wan to make sure that these are the fields we are looking at for this - https://tools.gbif.org/dwca-validator/extension.do?id=http://rs.tdwg.org/chrono/terms/ChronometricAge

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3958#issuecomment-963563157, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ72YRGZIYY4DP2IKVF6DULA2EDANCNFSM5EUPAC4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Jegelewicz commented 2 years ago

Thanks @tucotuco

So - if we are NOT currently sending the Chrono extension, how is this information getting to GBIF?

image

(from https://www.gbif.org/occurrence/2990918315)

I can see that we include it in our "IPT publishing table" as follows:

getGeologyForDWC(locality.locality_id,'Erathem/Era') as earliestEraOrLowestErathem, getGeologyForDWC(locality.locality_id,'Eon/Eonothem') as earliestEonOrLowestEonothem, getGeologyForDWC(locality.locality_id,'Series/Epoch') as earliestEpochOrLowestSeries, getGeologyForDWC(locality.locality_id,'Stage/Age') as earliestAgeOrLowestStage, getGeologyForDWC(locality.locality_id,'System/Period') as earliestPeriodOrLowestSystem, getGeologyForDWC(locality.locality_id,'formation') as formation, getGeologyForDWC(locality.locality_id,'group') as "group", getGeologyForDWC(locality.locality_id,'member') as member,

what I don't get is that this information is apparently not part of the "occurrence" core (as downloaded in this template), yet it is somehow getting published?

Jegelewicz commented 2 years ago

Also, just putting this here for now - the extension terms with info

Extension term Term Label Definition Best practice Examples Term type
ChronometricAge Chronometric Age An approximation of a temporal position (in the sense conveyed by https://www.w3.org/TR/owl-time/#time:TemporalPosition) that is supported via evidence. The age of a specimen and how this age is known, whether by a dating assay, a relative association with dated material, or legacy collections information. An age range associated with a specimen derived from an AMS dating assay applied to an oyster shell in the same stratum; An age range associated with a specimen derived from a ceramics analysis based on other materials found in the same stratum; A maximum age associated with a specimen derived from K-Ar dating applied to a proximal volcanic tuff found stratigraphically below the specimen; An age range of a specimen based on its biostratigraphic context; An age of a specimen based on what is reported in legacy collections data. http://www.w3.org/2000/01/rdf-schema#Class
chronometricAgeConversionProtocol Chronometric Age Conversion Protocol The method used for converting the uncalibratedChronometricAge into a chronometric age in years, as captured in the earliestChronometricAge, earliestChronometricAgeReferenceSystem, latestChronometricAge, and latestChronometricAgeReferenceSystem fields. For example, calibration of conventional radiocarbon age or the currently accepted age range of a cultural or geological period. INTCAL13, sequential 6 phase Bayesian model and IntCal13 calibration http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
chronometricAgeDeterminedBy Chronometric Age Determined By A list (concatenated and separated) of names of people, groups, or organizations who determined the ChronometricAge. Recommended best practice is to separate the values in a list with space vertical bar space ( | ). Michelle LeFebvre \| Neill Wallis http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
chronometricAgeDeterminedDate Chronometric Age Determined Date The date on which the ChronometricAge was determined. Recommended best practice is to use a date that conforms to ISO 8601-1:2019. 1963-03-08T14:07-0600 (8 Mar 1963 at 2:07pm in the time zone six hours earlier than UTC). 2009-02-20T08:40Z (20 February 2009 8:40am UTC). 2018-08-29T15:19 (3:19pm local time on 29 August 2018). 1809-02-12 (some time during 12 February 1809). 1906-06 (some time in June 1906). 1971 (some time in the year 1971). http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
chronometricAgeID Chronometric Age ID An identifier for the set of information associated with a ChronometricAge. May be a global unique identifier or an identifier specific to the dataset. This can be used to link this record to another repository where more information about the dataset is shared. https://www.canadianarchaeology.ca/samples/70673 http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
chronometricAgeProtocol Chronometric Age Protocol A description of or reference to the methods used to determine the chronometric age. radiocarbon AMS, K-Ar dates for the lower most marker tuff, historic documentation, ceramic seriation http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
chronometricAgeReferences Chronometric Age References A list (concatenated and separated) of identifiers (publication, bibliographic reference, global unique identifier, URI) of literature associated with the ChronometricAge. Recommended best practice is to separate the values in a list with space vertical bar space ( | ). Pluckhahn, Thomas J., Neill J. Wallis, and Victor D. Thompson. 2020 The History and Future of Migrationist Explanation in the Archaeology of the Eastern Woodlands: A Review and Case Study of the Woodland Period Gulf Coast. Journal of Archaeological Research. https://doi.org/10.1007/s10814-019-09140-x http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
chronometricAgeRemarks Chronometric Age Remarks Notes or comments about the ChronometricAge. Beta Analytic number: 323913 \| One of the Crassostrea virginica right valve specimens from North Midden Feature 17 was chosen for AMS dating, but it is unclear exactly which specimen it was. http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
chronometricAgeUncertaintyInYears Chronometric Age Uncertainty In Years The temporal uncertainty of the earliestChronometricAge and latestChronometicAge in years. The expected unit for this field is years. The value in this field is number of years before and after the values given in the earliest and latest chronometric age fields within which the actual values are estimated to be. 100 http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
chronometricAgeUncertaintyMethod Chronometric Age Uncertainty Method The method used to generate the value of chronometricAgeUncertaintyInYears. 2-sigma calibrated range, Half of 95% confidence interval http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
earliestChronometricAge Earliest Chronometric Age The maximum/earliest/oldest possible age of a specimen as determined by a dating method. The expected unit for this field is years. This field, if populated, must have an associated earliestChronometricAgeReferenceSystem. 100 http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
earliestChronometricAgeReferenceSystem Earliest Chronometric Age Reference System The reference system associated with the earliestChronometricAge. Recommended best practice is to use a controlled vocabulary. kya,mya,BP,AD,BCE,ka,Ma,Ga http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
latestChronometricAge Latest Chronometric Age The minimum/latest/youngest possible age of a specimen as determined by a dating method. The expected unit for this field is years. This field, if populated, must have an associated latestChronometricAgeReferenceSystem. 27 http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
latestChronometricAgeReferenceSystem Latest Chronometric Age Reference System The reference system associated with the latestChronometricAge. Recommended best practice is to use a controlled vocabulary. kya,mya,BP,AD,BCE,ka,Ma,Ga http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
materialDated Material Dated A description of the material on which the chronometricAgeProtocol was actually performed, if known. Double Tuff, Charcoal found in Stratum V, charred wood, tooth http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
materialDatedID Material Dated ID An identifier for the MaterialSample on which the chronometricAgeProtocol was performed, if applicable. dwc:materialSampleID: https://www.ebi.ac.uk/metagenomics/samples/SRS1930158 http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
materialDatedRelationship Material Dated Relationship The relationship of the materialDated to the subject of the ChronometricAge record, from which the ChronometricAge of the subject is inferred. Recommended best practice is to use a controlled vocabulary. sameAs (cases where the subject material was completely destructively subsampled to get the ChronometricAge), subsampleOf (cases where part of the original specimen was extracted as the material used to determine the ChronometricAge), inContextWith (cases where the ChronometricAge is inferred from materialDated, such as sediments or cultural objects, in related temporal context),stratigraphicallyCorrelatedWith (cases where the ChronometricAge is inferred from materialDated in a stratigraphically correlated context) http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
uncalibratedChronometricAge Uncalibrated Chronometric Age The output of a dating assay before it is calibrated into an age using a specific conversion protocol. 1510 +/- 25 14C yr BP, 16.26 Ma +/- 0.016 http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
verbatimChronometricAge Verbatim Chronometric Age The verbatim age for a specimen, whether reported by a dating assay, associated references, or legacy information. For example, this could be the radiocarbon age as given in an AMS dating report. This could also be simply what is reported as the age of a specimen in legacy collections data. 27 BC to 14 AD, stratigraphically pre-1104 http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
tucotuco commented 2 years ago

So - if we are NOT currently sending the Chrono extension, how is this information getting to GBIF?

image

(from https://www.gbif.org/occurrence/2990918315)

I can see that we include it in our "IPT publishing table" as follows:

getGeologyForDWC(locality.locality_id,'Erathem/Era') as earliestEraOrLowestErathem, getGeologyForDWC(locality.locality_id,'Eon/Eonothem') as earliestEonOrLowestEonothem, getGeologyForDWC(locality.locality_id,'Series/Epoch') as earliestEpochOrLowestSeries, getGeologyForDWC(locality.locality_id,'Stage/Age') as earliestAgeOrLowestStage, getGeologyForDWC(locality.locality_id,'System/Period') as earliestPeriodOrLowestSystem, getGeologyForDWC(locality.locality_id,'formation') as formation, getGeologyForDWC(locality.locality_id,'group') as "group", getGeologyForDWC(locality.locality_id,'member') as member,

what I don't get is that this information is apparently not part of the "occurrence" core (as downloaded in this template), yet it is somehow getting published?

All of those are properties from the GeologicalContext class in Simple Darwin Core. They are also in the Occurrence Core (https://tools.gbif.org/dwca-validator/extension.do?id=dwc:Occurrence#GeologicalContext) and mappable in the IPT under that Core type. The Chrono terms are much more specifically about dating and the processes used to arrive at those dates. This will allow one to publish both the date of collection (in eventDate) and the date of the context of placement (provenience).

Jegelewicz commented 2 years ago

@tucotuco thanks again. I am still confused about Darwin Core vs. Darin Core Archive and what exactly is accepted by IPT and aggregators. If GBIF is telling me - here is your template for IPT and it doesn't include https://tools.gbif.org/dwca-validator/extension.do?id=dwc:Occurrence#GeologicalContext how would I ever know I could include it?

I guess what I am asking is where do I find the complete list of terms accepted in a Darwin Core occurrence archive? Why is that different from https://dwc.tdwg.org/terms/#occurrence ? and how would anyone know what else to include?

dbloom commented 2 years ago

@Jegelewicz "Darwin Core" is the a data sharing standard, officially The Darwin Core Standard. A Darwin Core Archive (DwC-A) is a package that is created by the IPT (and some other software packages) for publication. I suggest you read this as a start for clarification: https://www.gbif.org/darwin-core. See below for more links to explain and demonstrate DwC.

All of the DwC terms are here: https://dwc.tdwg.org/terms/ - this is also known as the Simple Darwin Core. All of the terms contained within accepted/registered extensions to the Darwin Core Standard are here (along with some works in progress): https://tools.gbif.org/dwca-validator/extensions.do

The terms listed under https://dwc.tdwg.org/terms/#occurrence are one cluster of related terms within the larger "bag of terms" that is the DwC Standard.

I suggest that you take a look at some of the webinars that have been presented on the topic. https://github.com/tdwg/dwc-qa/wiki/Webinars In particular, Chapters 0-5. Just get some popcorn and a soda and go to town.

Jegelewicz commented 2 years ago

Sigh - I am just going around in circles.

tucotuco commented 2 years ago

@tucotuco thanks again. I am still confused about Darwin Core vs. Darin Core Archive and what exactly is accepted by IPT and aggregators.

A Darwin Core Archive is an implementation of the Darwin Core Standard as Text (as covered by the Darwin Core Text Guide (https://dwc.tdwg.org/text/). That guide gives the specification, the Darwin Core Archive is how how GBIF has chosen to implement it and build tools to support it (such as the IPT).

The Darwin Core Occurrence Core (https://tools.gbif.org/dwca-validator/extension.do?id=dwc:Occurrence) is a specification created by GBIF (now with support from the Darwin Core Maintenance Group) for how to share Occurrence data in a Darwin Core Archive.

The Darwin Core Occurrence Class (https://tools.gbif.org/dwca-validator/extension.do?id=dwc:Occurrence) has properties deemed to be most directly related to an Occurrence rather than to any of the other Darwin Core classes. If there was a table for Occurrences, the term organized in that class are good candidates to be fields in that table. But a Darwin Core Occurrence in a Darwin Core Archive using the Darwin Core Occurrence Core is one big flat table containing all the terms from all the classes of Darwin Core that it reasonable can (that is, not MeasurementOrFacts, ResourceRelationships, or ChronometricAges, all of which we call extensions, because they wouldn't work in one flat table or spreadsheet with the rest of the Occurrence record).

GBIF accepts Darwin Core Archives based on three distinct "Cores", of which Occurrence is one. In addition to the data in the core, GBIF also accepts data in extensions in Darwin Core Archives. The list of Cores and Extensions supported in production IPTs can be seen at https://tools.gbif.org/dwca-validator/extensions.do, along with extensions that are in development and which can be used in IPTs in Test mode.

If GBIF is telling me - here is your template for IPT and it doesn't include https://tools.gbif.org/dwca-validator/extension.do?id=dwc:Occurrence#GeologicalContext how would I ever know I could include it?

The Occurrence Core include every field except the ones exclusively for extensions, including all the GeologicalContext terms. They are all available to map in the IP when that Occurrence Core is chosen. I'm curious why you think they are not included.

I hope that helps. If not, keep the questions coming.

Jegelewicz commented 2 years ago

I'm curious why you think they are not included.

Because when I go to get a template - the only fields included are these:

occurrenceID basisOfRecord eventDate endDayOfYear year month day verbatimEventDate eventRemarks scientificName higherClassification kingdom phylum class order family genus specificEpithet infraspecificEpithet taxonRank identifiedBy dateIdentified nomenclaturalCode decimalLatitude decimalLongitude geodeticDatum coordinateUncertaintyInMeters verbatimCoordinates verbatimCoordinateSystem georeferencedBy georeferencedDate georeferenceProtocol georeferenceSources georeferenceVerificationStatus higherGeography continent islandGroup island country countryCode stateProvince county locality verbatimLocality locationAccordingTo type modified language license references institutionID institutionCode collectionCode catalogNumber occurrenceRemarks recordNumber recordedBy organismID individualCount organismQuantity organismQuantityType establishmentMeans preparations otherCatalogNumbers previousIdentifications

Jegelewicz commented 2 years ago

Thank you very much @tucotuco

It seems that the template offered up by GBIF (for those who want to use a "spreadsheet") is just incomplete, but misleading because not knowing any better, it seems like that is all you can include.

That was very confusing to me! But now I think I get it.

tucotuco commented 2 years ago

Yes, that makes sense. That page is in serious need of a LOT of revisions, not just for the confusion it caused you. I'm glad we've got it straight now.

On Tue, Nov 9, 2021 at 8:50 PM Teresa Mayfield-Meyer < @.***> wrote:

Thank you very much @tucotuco https://github.com/tucotuco

It seems that the template offered up by GBIF (for those who want to use a "spreadsheet") is just incomplete, but misleading because not knowing any better, it seems like that is all you can include.

That was very confusing to me! But now I think I get it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/3958#issuecomment-964648001, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ727DG7R5TQIOWRB36I3ULGXU3ANCNFSM5EUPAC4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.