Related Dataset #9

Open juhahakala opened 3 years ago

Proposed DCMI Metadata Terms: http://purl.org/dc/terms/relatedDataset

Label: Related Dataset

Dataset referenced in the described resource.

SRAP: Dataset referenced in the described scholarly resource.

Recommended practice is to identify the dataset with a URI identifying either the dataset or a landing page through which the dataset is accessed.

https://doi.org/10.17605/OSF.IO/B6KJZ -- Discussion -- URI will usually be based on PID (such as DOI, as in the example). DataCite DOIs resolve to the landing page which may contain URI links to 1-n manifestations of the data set. Work level citation should not be a problem in this case.

How is this different from the dct:references property? What makes the necessity for something like supporting dataset as opposed supporting magazine? In what way is the supporting dataset defined? I mean what exactly is the bibliographic relationship given something like Tillet's list of relationships.

Related Dataset and Related Code are both subproperties of dct:references. With these properties it is possible to provide links to research datasets and applications which were essential in creation of the described scholarly resource. dct:references may be used of this purpose as well, but unlike Related Dataset and Related Code it does not reveal the nature of the linked object.

@juhahakala,

Several comments:

I understand the desire to create relationships to code and datasets which are essential to the main object of description, but why would a schema, ontology, or application profile specify them overtly? We have dct:source. dct:source can be used in conjunction with dct:references. Presumably the object of description would be adequately described with a dct:source relationship and that resource would have a DCMIType indicator. Thereby generic relationships could be used and the type of thing which is source would be identified with the DCMIType vocabulary.
Why would the term Related Code be used instead of Related Software? The term Software is already in use within the DCT namespace, introducing another lexical element seems only to bring ambiguity. That is, is all software code? is all code software?
The semantics of Related are different from Source in natural English. Related can have a very broad meaning. Mercurial is software which is related to Git. But there is no source relationship.
There are many reasons to reference something, source relationship is only one of them. The CiTO points out others, but secondary research on CiTO points out even more citation types.

I agree with Hugh. We should think about the best way to scale this type of information. There could be many different kinds of related resources, and I don't think we want to do properties for all of them. dct:references recommends non-literal values, which could them themselves be given a dct:Type class. The latter has both Dataset and Software. In that way, the type is a characteristic of the referenced entity, not the predicate. And in fact, depending on what cataloging has been done, the referenced entity may already be described with a type.

In the 32nd meeting, we decided to use dct:source for description of resources that have had an essential role in the production of the described resource. Proposed elements RelatedCode and RelatedDataSet will be dropped. Instead, dct:source will be linked to the type of the source material. This change opens two additional tasks. First, a controlled vocabulary of source materials in required. Software and dataset are obvious choices, but that may not be enough. Be that as it may, adding new terms to the SRAP source type vocabulary will be easier than adding new properties to the SRAP itself. Second, it is necessary to specify syntax for linking the source type to the source specification.

Can someone help me better understand how making the value of “source” a description is more functional than making the value of “source” an identifier?

I wasn’t at the 32nd meeting, so I don’t follow the context. It just seems that the semantics are significantly different from DC and I want to follow.

Kind regards, Hugh

On Mon, Aug 7, 2023 at 9:39 AM juhahakala @.***> wrote:

In the 32nd meeting, we decided to use dct:source for description of resources that have had an essential role in the production of the described resource. Proposed elements RelatedCode and RelatedDataSet will be dropped. Instead, dct:source will be linked to the type of the source material. This change opens two additional tasks. First, a controlled vocabulary of source materials in required. Software and dataset are obvious choices, but that may not be enough. Be that as it may, adding new terms to the SRAP source type vocabulary will be easier than adding new properties to the SRAP itself. Second, it is necessary to specify syntax for linking the source type to the source specification.

— Reply to this email directly, view it on GitHub https://github.com/dcmi/dc-srap/issues/9#issuecomment-1667350546, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAJ2JUFNUA5PPSMR52RBD3XUCLTPANCNFSM47P3NW2Q . You are receiving this because you commented.Message ID: @.***>

-- All the best, -Hugh

Sent from my iPhone

@HughP It isn't intended to be a textual description - the related file will be located somewhere with a URL. Ideally, that file will be described with its own metadata, thus constituting a "description". There are a number of different existing metadata schemes that have types that we could use. What we haven't discussed is whether SRAP would define how such files might be described. I'll try to mock up an example.

I see this as different from CiTO because the intention here is that these are files that are essential parts of the scholarly work itself, and which can be "published" simultaneously with the article in digital form. I now begin to wonder if this implies a way to package the article and these supporting files together, a kind of directory that would cause them to always be retrieved together. That implies a stronger relationship than dct:source but presumably could be implemented in software.

After reviewing the discussion on how to represent related datasets, I suggest using dct:relation flexibly, with a controlled vocabulary to specify the nature of the relationship. This approach allows for identifying various relationships (like related data, associated software, etc.) clearly and efficiently.

Example:

Imagine we have Dataset A used to develop a machine learning model in a study. Dataset B is a related dataset generated as an outcome of the study. This relationship could be represented as:

http://example.org/datasetA dct:relation [ a dct:Type ; dct:identifier http://example.org/datasetB ; dct:description "Dataset B generated as an outcome of the analysis of Dataset A." ; ] .

In this example, dct:Type would be part of a controlled vocabulary that specifies Dataset B as a direct outcome of Dataset A, providing clarity about the relationship between the two datasets and enhancing interoperability.

The new dct:relation now supports this, if you use it to point to a SRAPResource that is a dataset (and provide a COAR Type to indicate that it's a dataset).

We just need better guidance in the SRAP specification on how to do this.

dcmi / dc-srap

Related Dataset #9