ecotaxa / ecotaxa_front

Front end of the EcoTaxa application
Other
6 stars 6 forks source link

List of equivalence between common terms in EcoTaxa, DarwinCore and BODC #621

Open jiho opened 3 years ago

jiho commented 3 years ago

Ideally, EcoTaxa should go towards being more and more standard. This is a list of all terms that make sense in EcoTaxa and could be made more standard because they exist in standard references/vocabularies.

EcoTaxa DwC or BODC
lat https://dwc.tdwg.org/list/#dwc_decimalLatitude Latitude North http://vocab.nerc.ac.uk/collection/P01/current/ALATZZ01/ in degrees
lon https://dwc.tdwg.org/list/#dwc_decimalLongitude Longitude East http://vocab.nerc.ac.uk/collection/P01/current/ALONZZ01/ in degrees
depth_min https://dwc.tdwg.org/list/#dwc_minimumDepthInMeters
depth_max https://dwc.tdwg.org/list/#dwc_maximumDepthInMeters
date https://dwc.tdwg.org/list/#dwc_eventDate
time https://dwc.tdwg.org/list/#dwc_eventTime
time of day
taxonomic class https://dwc.tdwg.org/list/#dwc_scientificNameID and https://dwc.tdwg.org/list/#dwc_scientificName
annotator (person) https://dwc.tdwg.org/list/#dwc_identifiedBy
annotation date https://dwc.tdwg.org/list/#dwc_dateIdentified
subsampling coefficient http://vocab.nerc.ac.uk/collection/P01/current/SSAMPC01/1/
Subsampling protocol http://vocab.nerc.ac.uk/collection/Q01/current/Q0100006/
water volume sampled http://vocab.nerc.ac.uk/collection/P01/current/VOLWBSMP/ in m3
area http://vocab.nerc.ac.uk/collection/P01/current/APIXBI01/ in pixels
major http://vocab.nerc.ac.uk/collection/P01/current/LGPIXEL1/ in pixels
minor http://vocab.nerc.ac.uk/collection/P01/current/WDPIXEL1/ in pixels
pixel size http://vocab.nerc.ac.uk/collection/P01/current/WDPIXEL2/ https://vocab.nerc.ac.uk/collection/P01/current/HTPIXEL2/ and in mm/pixel
individual volume (of object) http://vocab.nerc.ac.uk/collection/P01/current/CVOLZZ01/ in mm3
abundance/count dimensionless
concentration http://vocab.nerc.ac.uk/collection/P01/current/SDBIOL01/ in n/m3
biovolume http://vocab.nerc.ac.uk/collection/P01/current/CVOLUKNB/ in mm3/m3
instrument Sampling instrument name http://vocab.nerc.ac.uk/collection/Q01/current/Q0100002/
mesh size
adyork commented 2 years ago

@jiho thanks for pointing out this mapping. It will be very useful to us.

It would be great if you could include your term definition, and the relationship Match Type to the vocab term like skos:exactMatch these are the ones we are currently using at BCO-DMO: image

EcoTaxa Term EcoTaxa Definition Match Type Term URI
instrument your definition skos:exactMatch http://vocab.nerc.ac.uk/collection/Q01/current/Q0100002/
concentration your definition not sure? skos:broadMatch? http://vocab.nerc.ac.uk/collection/P01/current/SDBIOL01/

We are working on updating our term mappings at BCO-DMO too. I'll report back what vocabs and terms we are looking at for terms like biovolume. Along with other terms we have started getting from data from EcoTaxa related datasets like some from ZooProcess-generated terms like object_feret, object_esd. Lots of those terms are currently mapped in our system to a broad term "image_analysis" but we want to be more resolved in the term mappings.

Examples from dataset https://www.bco-dmo.org/dataset/857891:

Dataset Term Dataset term Definition Match Type Term URI
object_esd Object Equivalent Spherical Diameter skos:broadMatch? https://vocab.nerc.ac.uk/collection/S06/current/S0600260/
object_feret Object Maximum feret diameter, i.e., the longest distance between any two points along the object boundary in pixels ? ?
object_major Object major axis length in pixels skos:exactMatch? http://vocab.nerc.ac.uk/collection/P01/current/LGPIXEL1/
object_minor Object minor axis length in pixels skos:exactMatch? http://vocab.nerc.ac.uk/collection/P01/current/WDPIXEL1/
jiho commented 2 years ago

Regarding the closeness of the match, the mapping between EcoTaxa terms and BODC terms may be somewhat loose and made on a rather "general concept" level.

To give you a concrete example, "length in pixels" (LGPIXEL1 at BODC) could be mapped to "major axis length" measured in a ZooProcess/ZooScan pipeline, to "major axis length" measured in an IFCB pipeline with a different algorithm, but also to "feret diameter" in ZooProcess pipeline. All are approximations of the length of the organism. The fact that two have "major axis" in their name do not necessarily make them more similar than "major" and "feret": there are at least 3 implementations (that I know of) to fit an ellipse to an object and measure major axis length; all are quite different.

Still this is a better situation that nothing at all, the large scale patterns will still be valid when mixing datasets of different origins, and inherent biological variability will likely overcome the variability introduced by the inconsistencies across image processing pipelines. Hence our decision.

In skos terms, it therefore looks to me that all matches would be broadMatch: the EcoTaxa term is one way to measure the BODC concept. Let me know what you think.