gbif / rs.gbif.org

GBIF machine-readable resources
https://rs.gbif.org
11 stars 13 forks source link

Measurements for materials and more #103

Open lhmarsden opened 1 year ago

lhmarsden commented 1 year ago

The extendedMeasurementOrFacts extension is a very useful way to record measurements or facts related to an occurrence or event in a standardised, potentially machine-readable way.

However, one might have measurements or facts related to a range of different things. For example, I work with many biologists who take measurements of materials or samples they are logging in a Material Sample Extension.

In this pull request, I am suggesting that the relatedResourceID term is added to the extendedMeasurementOrFact extension - after some discussion with @dagendresen. https://github.com/gbif/rs.gbif.org/pull/102

One could use this to record measurements related not only to material samples, but anything else, without the need of a resourceRelationship extension.

timrobertson100 commented 1 year ago

Thanks for opening this @lhmarsden I'll ping the OBIS group to comment

albenson-usgs commented 1 year ago

One other thing to consider- if we make this change to EMoF should we also make the same change to MoF?

albenson-usgs commented 1 year ago

Haven't thought it through, just thinking out loud, would this be a workable solution for this request https://github.com/tdwg/dwc/issues/362 ?

tucotuco commented 1 year ago

Haven't thought it through, just thinking out loud, would this be a workable solution for this request tdwg/dwc#362 ?

This request [tdwg/dwc#362] has passed public review and is being prepared for an Executive decision.

pieterprovoost commented 1 year ago

This sounds like a generalization of occurrenceID in ExtendedMeasurementOrFact, if we add this then maybe occurrenceID should be retired or deprecated? I agree with @albenson-usgs regarding https://github.com/tdwg/dwc/issues/362 but I'm not sure how to reconcile the two proposals.

tucotuco commented 1 year ago

If there are doubts aI urge you to jump in and question https://github.com/tdwg/dwc/issues/362 before it goes to the ratification process. In the Unified Model, we are proposing to allow Assertions on anything be declaring both the type of thing the Assertion is about (which "table") and the key for the record for that type (the equivalent of relatedResourceID).

ymgan commented 1 year ago

AntOBIS supports this proposal.

We have some use cases. For example, stomach content of a predator in an Occurrence is assessed to determine the fraction of the predator diet that a prey type made up (by weight). Having this term in emof will allow us to establish predator-prey relationship in an easier manner by having the occurrenceID of the prey as relatedResourceID for the Measurement of the predator. So, we might still need occurrenceID for emof here (I think), unless the dataset has to be published as Occurrence core.

edit: after talking to @pieterprovoost, the relationship probably should be established at Occurrence level (e.g. associatedTaxa or associatedOccurrence)

dagendresen commented 1 year ago

dwc:ResourceRelationship dwc:resourceID is the subject, and dwc:relatedResourceID is the object

Would the resource in a measurement (or fact) be the subject or the object? Would the eMoF document something the resource is doing or something done to the resource?

I have been thinking of the occurrenceID resource of the eMoF as the subject of the measurement and thus better replaced by adding the resourceID term to the eMoF extension?

ymgan commented 1 year ago

For AntOBIS example:

So

I think it is nicer to specify it here than putting a list of prey occurrences under predator's occurrence (associatedOccurrence). And of course, alternatively, we can use resource relationship extension.

I hope our example makes sense?

lhmarsden commented 1 year ago

Hi,

Do you know how long it is likely to be before I can use (if accepted) resourceID in the emof extension? I have some data to publish, and am wondering if I should proceed with a resourceRelationship extension instead.

Thanks!

pieterprovoost commented 1 year ago

@ymgan Would you mind writing out an example, because I'm not clear on how the predator/prey problem this relates to this proposal. This is how I interpret the current proposal:

subject predicate object
ResourceRelationship resourceID relationshipOfResourceID relatedResourceID
eMoF occurrenceID measurementTypeID measurementValueID
eMoF change proposal resourceID measurementTypeID measurementValueID
lhmarsden commented 1 year ago

I think your interpretation of the proposal is correct, @pieterprovoost. A way of recording measurements related to a materialSample or any other resource.

ymgan commented 1 year ago

occurrence

occurrenceID scientificName associatedOccurrences
occ_001 Pachyptila belcheri "predator of" : ["occ_002", "occ_003"]
occ_002 Crustacea
occ_003 Euphausia vallentini

eMoF

occurrenceID relatedResourceID measurementType measurementValue
occ_001 occ_002 fraction diet by prey items based on regurgitate content 0.997
occ_001 occ_003 fraction diet by prey items based on regurgitate content 0.002

It is the measurement of the stomach content of the predator (occ_001), so I think the eMoF records should point to occ_001. Without the relatedResourceID, the information of the prey established based on stomach content of the bird is lost unless I use the resourceRelationship extension.

occurrenceID measurementType measurementValue
occ_001 fraction diet by prey items based on regurgitate content 0.997
occ_001 fraction diet by prey items based on regurgitate content 0.002

That is how I look at it, but please correct me if my understanding is wrong.


Edit: looking at this after thinking a little more based on Guillaume's comment:

occurrence

occurrenceID scientificName basisOfRecord preparations associatedOccurrences
occ_001 Pachyptila belcheri HumanObservation "predator of" : ["occ_002", "occ_003"]
occ_002 Crustacea MaterialSample regurgitate content
occ_003 Euphausia vallentini MaterialSample regurgitate content

eMoF

occurrenceID relatedResourceID measurementType measurementValue
occ_002 occ_001 fraction diet based on regurgitate content 0.997
occ_003 occ_001 fraction diet based on regurgitate content 0.002
pieterprovoost commented 1 year ago

@lhmarsden Replacing occurrenceID with resourceID has considerable impact on our indexing and is not something that can be achieved in the short term. What we could do is add resourceID, keep occurrenceID for now, and keep indexing as we do now taking only into account occurrenceID.

lhmarsden commented 1 year ago

I appreciate that replacing occurrenceID with resourceID would be a big change. Adding resourceID as an extra would be a suitable short term solution in my opinion.

Luke


From: Pieter Provoost @.> Sent: Wednesday, May 3, 2023 2:56:38 PM To: gbif/rs.gbif.org @.> Cc: Luke Marsden @.>; Mention @.> Subject: Re: [gbif/rs.gbif.org] Measurements for materials and more (Issue #103)

@lhmarsdenhttps://github.com/lhmarsden Replacing occurrenceID with resourceID has considerable impact on our indexing and is not something that can be achieved in the short term. What we could do is add resourceID, keep occurrenceID for now, and keep indexing as we do now taking only into account occurrenceID.

— Reply to this email directly, view it on GitHubhttps://github.com/gbif/rs.gbif.org/issues/103#issuecomment-1532983566, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOMNBFOXEZGXUFN76KKIBKDXEJIYNANCNFSM6AAAAAAWNLF7HA. You are receiving this because you were mentioned.Message ID: @.***>

guillaumebody commented 1 year ago

Hi,

occurrence

occurrenceID scientificName associatedOccurrences occ_001 Pachyptila belcheri "predator of" : ["occ_002", "occ_003"] occ_002 Crustacea
occ_003 Euphausia vallentini

eMoF

occurrenceID relatedResourceID measurementType measurementValue occ_001 occ_002 fraction diet by prey items based on regurgitate content 0.997 occ_001 occ_003 fraction diet by prey items based on regurgitate content 0.002

It is the measurement of the stomach content of the predator (occ_001), so I think the eMoF records should point to occ_001. Without the relatedResourceID, the information of the prey established based on stomach content of the bird is lost unless I use the resourceRelationship extension. occurrenceID measurementType measurementValue occ_001 fraction diet by prey items based on regurgitate content 0.997 occ_001 fraction diet by prey items based on regurgitate content 0.002

That is how I look at it, but please correct me if my understanding is wrong.

We actually proposed another way to deal with such exemple. We had similar issues while identifying pathogens within another species. (Applying Darwin core data standard to wildlife disease – advancements toward a new data model). See also #413.

Using this parentOccurenceID terms it would results in

occurrence

occurrenceID parentOccurenceID scientificName basisOfRecord preparation
occ_001 Pachyptila belcheri human observation
occ_002 occ_001 Crustacea material sample regurgigate content
occ_003 occ_001 Euphausia vallentini material sample regurgigate content

eMoF

measurementID occurrenceID measurementType measurementValue
mea_001 occ_002 fraction diet 0.997
mea_002 occ_003 fraction diet 0.002
ymgan commented 1 year ago

That seems to work!! Thank you very much for taking your time to write this down @guillaumebody !! I appreciate it!

guillaumebody commented 1 year ago

You're welcome @ymgan , but please, indicate that this is a relevant solution for your situation in #413. The parentOccurenceID is currently not an accepted term of DwC.

@tucotuco this situation plaid to have "parentAssertionID" aside to "relatedAssertionID" concept in the new GBIF model