gbif / rs.gbif.org

GBIF machine-readable resources
https://rs.gbif.org
11 stars 13 forks source link

added relatedResourceID to eMoF #102

Open lhmarsden opened 1 year ago

lhmarsden commented 1 year ago

The extendedMeasurementOrFacts extension is a very useful way to record measurements or facts related to an occurrence or event in a standardised, potentially machine-readable way.

However, one might have measurements or facts related to a range of different things. For example, I work with many biologists who take measurements of materials or samples they are logging in a Material Sample Extension.

In this pull request, I am suggesting that the relatedResourceID term is added to the extendedMeasurementOrFact extension - after some discussion with @dagendresen.

One could use this to record measurements related not only to material samples, but anything else, without the need of a resourceRelationship extension.

timrobertson100 commented 1 year ago

Thank you @lhmarsden!

Can you please open an issue describing the rationale for this (i.e. copying from above), so those who are using this extensively have the opportunity to comment (mainly OBIS) before we move to implement it?

pieterprovoost commented 1 year ago

@lhmarsden Would you mind updating this PR to resourceID?

lhmarsden commented 1 year ago

@pieterprovoost Done

timrobertson100 commented 1 year ago

@pieterprovoost - thanks for approving this request.

@MattBlissett @ManonGros @tucotuco - please can you comment if you have concerns, or thumb up that you agree to merge this? It's an extension used by OBIS primarily (exclusively?) and has come from them.

timrobertson100 commented 1 year ago

I've converted this to draft as we will need to implement this as a new edition (filename and issued tags) but will fix that up as we merge this.

I've pinged a couple of people for a final possibility to comment before merging.

ManonGros commented 1 year ago

@lhmarsden I don't think I understand the idea. When would someone be using the relatedResourceID? Would it be in addition to an occurrence or eventID? If there is one eMoF in a dataset with occurrences and a materialSample extension. How would users know where to look for the resource based on the ID? Would the relatedResourceID always refer to the extensions and never the core?

[EDIT] This thread (https://github.com/gbif/rs.gbif.org/issues/103) answers some questions (it wouldn't replace the occurrence and eventID and I can see examples of how it can be used for occurrences). I am not sure I understand how this would work though:

One could use this to record measurements related not only to material samples, but anything else, without the need of a resourceRelationship extension.

How would users know what the measurement refers to?

tucotuco commented 1 year ago

OBIS is definitely not the only group to be using the extension. I have recommended to others to use it many times.

Regardless of who might be using it, I would like to echo @ManonGros's questions about implementation. I like the added capacity this reflects, but how will anyone know what file to look in for the connection, without scanning until finding the matching identifier. And then, having found one, what happens if it is in a one-to-many relationship? What happens if identifiers are not GUIDs, but identifiers unique only within the scope of the classes they belong in?

I think it would be more robust to add specific identifiers for the classes for which there is demand to support, e.g., materialSampleID. And even in that case, the MaterialSample task group is hoping to recommend phasing out the term MaterialSample in favor of MaterialEntity. If they both begin to come into use, there is going to be some confusion. My best guess is that MaterialEntity and materialEntityID will be ratified as new DwC terms. You could anticipate that, with the associated risk, by including materialEntityID in the EMoF extension.

These aren't objections, they are suggestions about potential consequences of the proposed way forward.

lhmarsden commented 1 year ago

If there is one eMoF in a dataset with occurrences and a materialSample extension. How would users know where to look for the resource based on the ID?

How would users know what the measurement refers to?

Regardless of who might be using it, I would like to echo @ManonGros's questions about implementation. I like the added capacity this reflects, but how will anyone know what file to look in for the connection, without scanning until finding the matching identifier. And then, having found one, what happens if it is in a one-to-many relationship? What happens if identifiers are not GUIDs, but identifiers unique only within the scope of the classes they belong in?

How does this currently work for the resourceRelationship extension? I guess the same problem is encountered here.

I have no objection to use materialSampleID or materialEntityID either.

But then what about other IDs? taxonID? organismID? geologicalContextID? There are many scenarios where it would be useful to include measurements for which existing terms are not available/suitable. Maybe all the relevant ID terms should be added?

timrobertson100 commented 1 year ago

How does this currently work for the resourceRelationship extension? I guess the same problem is encountered here.

Yes, it appears to be the same problem to me.

There are many scenarios where it would be useful to include measurements for which existing terms are not available/suitable. Maybe all the relevant ID terms should be added?

This is precisely one of the challenges posed by the DwC-A star schema and its forced denormalization. That's a reason we are exploring more expressive models using Frictionless Data schemas, such as the partial Material model found here, which we are researching alongside the IPT v3 branch (some months out still).

It's not ideal, but considering the circumstances, I believe it's reasonable to include the requested term in this pull request. I want to acknowledge the limitations highlighted by @tucotuco, the upcoming proper fix with the model, and the fact that neither GBIF.org (nor OBIS?) will attempt to interpret this. However, it's important to remember that extensions were designed to allow sub-communities to add additional elements according to their specific needs.

pieterprovoost commented 1 year ago

@timrobertson100 I can confirm that OBIS will not interpret this either.

ManonGros commented 1 year ago

Thanks Luke! I was under the impression that the Resource Relationship extension was only to express relationships between the records of the core. I see now that I was mistaken:

Support for relationships between resources in the Core, in an extension, or external to the data set

I guess it does include the same limitations as having a relatedResourceID. I would have expected the Resource Relationship extension to have some additional field specifying where to find the related resource.

I find that relatedResourceID without any context in the eMoF and in the resource relationship extension is difficult to interpret for users. That being said, the new model should address those limitations. In the meanwhile, maybe it is good to have relatedResourceID in the eMoF.