geo-mac / Rioxx-development

0 stars 2 forks source link

Rioxx 3 modelling experiments #1

Open geo-mac opened 1 year ago

geo-mac commented 1 year ago

Posted on behalf of @MickEadie

RIOXX SCHEMA V2

An item with an accepted manuscript in the repository and a link to the published version DOI

<rioxx xsi:schemaLocation="http://www.rioxx.net/schema/v2.0/rioxx/ http://www.rioxx.net/schema/v2.0/rioxx/rioxx.xsd">
<ali:free_to_read></ali:free_to_read>
<ali:license_ref start_date="2019-12" >http://creativecommons.org/licenses/by-nc-nd/4.0</ali:license_ref>
<dc:description>We conducted a field experiment to evaluate the impact of job search assistance on the employment of recently arrived refugees in Germany. The treatment group received job-matching support: an NGO identified suitable vacancies and sent the refugees’ CVs to employers. Six months after the start of the treatment, we find no evidence for positive treatment effects on employment. However, after twelve months, we detect positive treatment effects: marginally significant for the full sample and larger in magnitude and significant for lower educated refugees and those who have not yet received a refugee status. These individuals face higher uncertainty about their residence status, they do not search effectively, lack access to alternative support programmes and may be disregarded by employers due to perceived higher hiring costs. Our results suggest that personalised job search assistance can improve labour market integration of these refugee groups by alleviating labour market frictions.</dc:description>
<dc:format>application/pdf</dc:format>
<dc:identifier>https://eprints.gla.ac.uk/190277/7/190277.pdf</dc:identifier>
<dc:language>en</dc:language>
<dc:publisher>Elsevier</dc:publisher>
<dc:source>0927-5371</dc:source>
<dc:title>Can job search assistance improve the labour market integration of refugees? Evidence from a field experiment</dc:title>
<dcterms:dateAccepted>2019-07-02</dcterms:dateAccepted>
<rioxxterms:apc>not required</rioxxterms:apc>
<rioxxterms:author id="http://orcid.org/0000-0002-0144-8566" >Battisti, Michele</rioxxterms:author>
<rioxxterms:author>Giesing, Yvonne</rioxxterms:author>
<rioxxterms:author>Laurentsyeva, Nadzeya</rioxxterms:author>
<rioxxterms:publication_date>2019-12</rioxxterms:publication_date>
<rioxxterms:type>Journal Article/Review</rioxxterms:type>
<rioxxterms:version>AM</rioxxterms:version>
<rioxxterms:version_of_record>http://doi.org/10.1016/j.labeco.2019.07.001</rioxxterms:version_of_record>
</rioxx>

RIOXX SCHEMA V3

As above, publisher version and accepted version now included in DC:Relation

Following elements removed or shifted to relation:

<rioxx xsi:schemaLocation="http://www.rioxx.net/schema/v3.0/rioxx/ http://www.rioxx.net/schema/v3.0/rioxx/rioxx.xsd">
<dc:description>We conducted a field experiment to evaluate the impact of job search assistance on the employment of recently arrived refugees in Germany. The treatment group received job-matching support: an NGO identified suitable vacancies and sent the refugees’ CVs to employers. Six months after the start of the treatment, we find no evidence for positive treatment effects on employment. However, after twelve months, we detect positive treatment effects: marginally significant for the full sample and larger in magnitude and significant for lower educated refugees and those who have not yet received a refugee status. These individuals face higher uncertainty about their residence status, they do not search effectively, lack access to alternative support programmes and may be disregarded by employers due to perceived higher hiring costs. Our results suggest that personalised job search assistance can improve labour market integration of these refugee groups by alleviating labour market frictions.</dc:description>
<dc:identifier>https://eprints.gla.ac.uk/190277/</dc:identifier> 

dc:identifier should now be a PID (The landing page)

<dc:language>en</dc:language>
<dc:publisher>Elsevier</dc:publisher> 
<dc:source>0927-5371</dc:source>
<dc:title>Can job search assistance improve the labour market integration of refugees? Evidence from a field experiment</dc:title>
<dc:coverage></dc:coverage>
<dc:subject></dc:subject>
<dcterms:dateAccepted>2019-07-02</dcterms:dateAccepted>
<rioxxterms:author id="http://orcid.org/0000-0002-0144-8566" >Battisti, Michele</rioxxterms:author>
<rioxxterms:author>Giesing, Yvonne</rioxxterms:author>
<rioxxterms:author>Laurentsyeva, Nadzeya</rioxxterms:author>
<rioxxterms:contributor></rioxxterms:contributor>
<rioxxterms:publication_date>2019-12</rioxxterms:publication_date>
<rioxxterms:record_public_release_date>2021-01-06</rioxxterms:record_public_release_date>
<rioxxterms:type>Journal Article/Review</rioxxterms:type> 
<rioxxterms:grant></rioxxterms:grant>
<rioxxterms:project></rioxxterms:project>

accepted manuscript version

<dc:relation type="http://purl.org/coar/resource_type/c_6501" 
    deposit_date="2019-07-12" 
    resource_exposed_date="2021-01-06" 
    rioxx_version="http://purl.org/coar/version/c_ab4af688f83e57aa"
    accessRightsURI="http://purl.org/coar/access_right/c_abf2"
    license_ref="http://creativecommons.org/licenses/by-nc-nd/4.0" 
    format="application/pdf">
            https://eprints.gla.ac.uk/190277/7/190277.pdf
</dc:relation>

publisher version

<dc:relation type="http://purl.org/coar/resource_type/c_6501" 
    resource_exposed_date="2019-12" 
    rioxx_version="http://purl.org/coar/version/c_970fb48d4fbd8a85"
    format="application/pdf">
            http://doi.org/10.1016/j.labeco.2019.07.001
</dc:relation>

</rioxx>

Questions

geo-mac commented 1 year ago

On some of the initial comments. Here are some additions:

ali:free_to_read (no longer required)

Yes, we ditched this in v.3.

ali:license_ref (now belongs with relation)

Yes, if we are not assuming the VoR is the root from which other expressions hang off, then license_ref has to accompany expressions, as mapped in fig. 2 here.

dc:format (no longer reqired, its a landing page)

Agreed. VoR is no longer described at 'root' level.

rioxxterms:apc (no longer required)

Yes, we ditched it. A hangover from RCUK days.

rioxxterms:version (now belongs with relation)

Agreed. We no longer describe VoR at root.

rioxxterms:version_of_record (no longer reqired, its just another relation)

Yes, agreed -- this is the corollary of ditching the VoR root concept.

geo-mac commented 1 year ago
<rioxx xsi:schemaLocation="http://www.rioxx.net/schema/v3.0/rioxx/ http://www.rioxx.net/schema/v3.0/rioxx/rioxx.xsd">
<dc:description>We conducted a field experiment to evaluate the impact of job search assistance on the employment of recently arrived refugees in Germany. The treatment group received job-matching support: an NGO identified suitable vacancies and sent the refugees’ CVs to employers. Six months after the start of the treatment, we find no evidence for positive treatment effects on employment. However, after twelve months, we detect positive treatment effects: marginally significant for the full sample and larger in magnitude and significant for lower educated refugees and those who have not yet received a refugee status. These individuals face higher uncertainty about their residence status, they do not search effectively, lack access to alternative support programmes and may be disregarded by employers due to perceived higher hiring costs. Our results suggest that personalised job search assistance can improve labour market integration of these refugee groups by alleviating labour market frictions.</dc:description>
<dc:identifier>https://eprints.gla.ac.uk/190277/</dc:identifier> ## should now be a PID (The landing page)
<dc:language>en</dc:language>
<dc:publisher>Elsevier</dc:publisher> 
<dc:source>0927-5371</dc:source>
<dc:title>Can job search assistance improve the labour market integration of refugees? Evidence from a field experiment</dc:title>
<dc:coverage></dc:coverage>
<dc:subject></dc:subject>
<dcterms:dateAccepted>2019-07-02</dcterms:dateAccepted>
<rioxxterms:author id="http://orcid.org/0000-0002-0144-8566" >Battisti, Michele</rioxxterms:author>
<rioxxterms:author>Giesing, Yvonne</rioxxterms:author>
<rioxxterms:author>Laurentsyeva, Nadzeya</rioxxterms:author>
<rioxxterms:contributor></rioxxterms:contributor>
<rioxxterms:publication_date>2019-12</rioxxterms:publication_date>
<rioxxterms:record_public_release_date>2021-01-06</rioxxterms:record_public_release_date>
<rioxxterms:type>Journal Article/Review</rioxxterms:type> 
<rioxxterms:grant></rioxxterms:grant>
<rioxxterms:project></rioxxterms:project>

We have to think carefully about including dc:identifier (for the PID) at the 'root' level. This is -- I think! -- where we left things at the end of the meeting. But it breaks the 'model' (such that there is one!). This is what I was getting at here. Does it introduce inconsistency not to declare dc:identifier in dc:relation when describing the AAM? Or is this an example of pushing everything down into dc:relation, as PW was concerned about? I am actually an advocate for the 'work' approach but I accept that it is probably too complicated to implement in practice, and many repositories will not buy into the conceptual approach. Retaining dc:identifier at the 'root' would be an acceptance that we are describing a 'work', of which there are expressions, including an AAM and a VoR. What are your thoughts? Incidentally, this argument holds for dc:publisher, dc:source , dc:type too, all of which might be reasonably be expected to be communicated within the VoR expression.

The other thing we should perhaps consider is that this is how the metadata is modelled for OAI-PMH purposes but not necessarily how repository editors or similar will describe the resource within the repository UI.

RE the questions:

geo-mac commented 1 year ago

Posted on behalf of @MickEadie

RIOXX SCHEMA V2

An item with no accepted manuscript in the repository but a copy of the published article is stored and there is also a link to the publisher version

<rioxx xsi:schemaLocation="http://www.rioxx.net/schema/v2.0/rioxx/ http://www.rioxx.net/schema/v2.0/rioxx/rioxx.xsd" >
<ali:free_to_read></ali:free_to_read>
<ali:license_ref start_date="2019-06-24" >http://creativecommons.org/licenses/by/4.0</ali:license_ref>
<dc:description>Traditionally, chemists have relied on years of training and accumulated experience in order to discov-er new molecules. But the space of possible molecules so vast, only a limited exploration with the tra-ditional methods can be ever possible. This means that many opportunities for the discovery of inter-esting phenomena have been missed, and in addition, the inherent variability of these phenomena can make them difficult to control and understand. The current state-of-the-art is moving towards the de-velopment of automated and eventually fully autonomous systems coupled with in-line analytics and decision-making algorithms. Yet even these, despite the substantial progress achieved recently, still cannot easily tackle large combinatorial spaces as they are limited by the lack of high-quality data. Herein, we explore the utility of active learning methods for exploring the chemical space by compar-ing collaboration between human experimenters with an algorithm-based search, against their perfor-mance individually to probe the self-assembly and crystallization of the polyoxometalate cluster Na6[Mo120Ce6O366H12(H2O)78]·200H2O (1). We show that the robot-human teams are able to increase the prediction accuracy to 75.6±1.8%, from 71.8±0.3% with the algorithm alone and 66.3±1.8% from only the human experimenters demonstrating that human-robot teams beat robots or humans working alone.</dc:description>
<dc:format>application/pdf</dc:format>
<dc:identifier>https://eprints.gla.ac.uk/185445/1/185445.pdf</dc:identifier>
<dc:language>en</dc:language>
<dc:publisher>American Chemical Society</dc:publisher>
<dc:source>1549-9596</dc:source>
<dc:title>Intuition-enabled machine learning beats the competition when joint human-robot teams perform inorganic chemical experiments</dc:title>
<dcterms:dateAccepted>2019-04-26</dcterms:dateAccepted>
<rioxxterms:author>Duros, Vasilios</rioxxterms:author>
<rioxxterms:author id="http://orcid.org/0000-0002-2211-4389" >Grizou, Jonathan</rioxxterms:author>
<rioxxterms:author id="http://orcid.org/0000-0002-5222-9611" >Sharma, Abhishek</rioxxterms:author>
<rioxxterms:author>Mehr, S. Hessam M.</rioxxterms:author>
<rioxxterms:author>Bubliauskas, Andrius</rioxxterms:author>
<rioxxterms:author>Frei, Przemyslaw</rioxxterms:author>
<rioxxterms:author id="http://orcid.org/0000-0002-0086-5173" >Miras, Haralampos N.</rioxxterms:author>
<rioxxterms:author id="http://orcid.org/0000-0001-8035-5757" >Cronin, Leroy</rioxxterms:author>
<rioxxterms:project funder_name="Engineering and Physical Sciences Research Council (EPSRC)" >Advanced Mass Spectrometry Kit for Controlling Chemical Robots and Exploring Complex Chemical Systems</rioxxterms:project>
<rioxxterms:project funder_name="Engineering and Physical Sciences Research Council (EPSRC)" >Programmable Molecular Metal Oxides (PMMOs) - From Fundamentals to Application</rioxxterms:project>
<rioxxterms:project funder_name="Engineering and Physical Sciences Research Council (EPSRC)" >The Multi-Corder: Poly-Sensor Technology</rioxxterms:project>
<rioxxterms:project funder_name="Engineering and Physical Sciences Research Council (EPSRC)" >Synthetic Biology applications to Water Supply and Remediation</rioxxterms:project>
<rioxxterms:project funder_name="Engineering and Physical Sciences Research Council (EPSRC)" >A Digital DNA Nano Writer (DNA NanoFab)</rioxxterms:project>
<rioxxterms:project funder_name="Engineering and Physical Sciences Research Council (EPSRC)" >Programmable 'Digital' Synthesis for Discovery and Scale-up of Molecules, Clusters and Nanomaterials</rioxxterms:project>
<rioxxterms:project funder_name="European Research Council (ERC)" >SMARTPOM: Artificial-Intelligence Driven Discovery and Synthesis of Polyoxometalate Clusters</rioxxterms:project>
<rioxxterms:publication_date>2019-06-24</rioxxterms:publication_date>
<rioxxterms:type>Journal Article/Review</rioxxterms:type>
<rioxxterms:version>VoR</rioxxterms:version>
<rioxxterms:version_of_record>http://doi.org/10.1021/acs.jcim.9b00304</rioxxterms:version_of_record>
</rioxx>

RIOXX SCHEMA V3

An item with no accepted manuscript in the repository and a related the published article

Questions:

<rioxx xsi:schemaLocation="http://www.rioxx.net/schema/v3.0/rioxx/ http://www.rioxx.net/schema/v3.0/rioxx/rioxx.xsd">
<dc:description>We conducted a field experiment to evaluate the impact of job search assistance on the employment of recently arrived refugees in Germany. The treatment group received job-matching support: an NGO identified suitable vacancies and sent the refugees’ CVs to employers. Six months after the start of the treatment, we find no evidence for positive treatment effects on employment. However, after twelve months, we detect positive treatment effects: marginally significant for the full sample and larger in magnitude and significant for lower educated refugees and those who have not yet received a refugee status. These individuals face higher uncertainty about their residence status, they do not search effectively, lack access to alternative support programmes and may be disregarded by employers due to perceived higher hiring costs. Our results suggest that personalised job search assistance can improve labour market integration of these refugee groups by alleviating labour market frictions.</dc:description>
<dc:identifier>https://eprints.gla.ac.uk/185445/</dc:identifier> ## should now be a PID (The landing page)
<dc:language>en</dc:language>
<dc:publisher>American Chemical Society</dc:publisher>
<dc:source>1549-9596</dc:source>
<dc:title>Intuition-enabled machine learning beats the competition when joint human-robot teams perform inorganic chemical experiments</dc:title>
<dc:coverage></dc:coverage>
<dc:subject></dc:subject>
<dcterms:dateAccepted>2019-04-26</dcterms:dateAccepted>
<rioxxterms:author id="http://orcid.org/0000-0002-0144-8566" >Battisti, Michele</rioxxterms:author>
<rioxxterms:author>Duros, Vasilios</rioxxterms:author>
<rioxxterms:author id="http://orcid.org/0000-0002-2211-4389" >Grizou, Jonathan</rioxxterms:author>
<rioxxterms:author id="http://orcid.org/0000-0002-5222-9611" >Sharma, Abhishek</rioxxterms:author>
<rioxxterms:author>Mehr, S. Hessam M.</rioxxterms:author>
<rioxxterms:author>Bubliauskas, Andrius</rioxxterms:author>
<rioxxterms:author>Frei, Przemyslaw</rioxxterms:author>
<rioxxterms:author id="http://orcid.org/0000-0002-0086-5173" >Miras, Haralampos N.</rioxxterms:author>
<rioxxterms:author id="http://orcid.org/0000-0001-8035-5757" >Cronin, Leroy</rioxxterms:author>
<rioxxterms:contributor></rioxxterms:contributor>
<rioxxterms:publication_date>2019-06-24</rioxxterms:publication_date>
<rioxxterms:record_public_release_date>2019-04-26</rioxxterms:record_public_release_date>
<rioxxterms:type>Journal Article/Review</rioxxterms:type>
<rioxxterms:grant></rioxxterms:grant>
<rioxxterms:project funder_name="Engineering and Physical Sciences Research Council (EPSRC)" funder_id="">EP/P00153X/1</rioxxterms:project>
<rioxxterms:project funder_name="Engineering and Physical Sciences Research Council (EPSRC)" funder_id="">EP/J015156/1</rioxxterms:project>
<rioxxterms:project funder_name="Engineering and Physical Sciences Research Council (EPSRC)" funder_id="">EP/K021966/1</rioxxterms:project>
<rioxxterms:project funder_name="Engineering and Physical Sciences Research Council (EPSRC)" funder_id="">EP/K038885/1</rioxxterms:project>
<rioxxterms:project funder_name="Engineering and Physical Sciences Research Council (EPSRC)" funder_id="">EP/L015668/1</rioxxterms:project>
<rioxxterms:project funder_name="Engineering and Physical Sciences Research Council (EPSRC)" funder_id="">EP/L023652/1</rioxxterms:project>
<rioxxterms:project funder_name="European Research Council (ERC)">670467</rioxxterms:project>

<!-- publisher version -->

<dc:relation type="http://purl.org/coar/resource_type/c_6501" 
    rioxx_version="http://purl.org/coar/version/c_970fb48d4fbd8a85">
            http://doi.org/10.1021/acs.jcim.9b00304
</dc:relation>

</rioxx>
geo-mac commented 1 year ago

How do we model the local copy of the published version? Identifer = https://eprints.gla.ac.uk/185445/1/185445.pdf ?

Yes, it is already getting hairy!

In this particular instance we could seek to insert an additional instance of dc:relation to reflect that there is a VoR deposit locally. This might look like this:

<!-- publisher version -->

<dc:relation type="http://purl.org/coar/resource_type/c_6501" 
    rioxx_version="http://purl.org/coar/version/c_970fb48d4fbd8a85">
            http://doi.org/10.1021/acs.jcim.9b00304
</dc:relation>

<dc:relation type="http://purl.org/coar/resource_type/c_6501" 
    deposit_date="2019-xx-xx" 
    resource_exposed_date="2019-xx-xx" 
    rioxx_version="http://purl.org/coar/version/c_970fb48d4fbd8a85"
    accessRightsURI="http://purl.org/coar/access_right/c_abf2"
    license_ref="http://creativecommons.org/licenses/by/4.0" 
    format="application/pdf">
            https://eprints.gla.ac.uk/185445/1/185445.pdf 
</dc:relation>

My possible concern about this approach is that there is nothing to relate them - ironically! In other words, a mechanism to state that they are the same item. On the other hand, both dc:relation type and rioxx_version are defined by the same URI, thereby stating that both instances of dc:relation pertain to a VoR of the described work (described at root level). So, perhaps I am overthinking this?

It would be tidier to encapsulate it all within a single instance of dc:relation - which gets us closer to what I proposed here, where the PID is declared as an attribute within dc:relation. For example:

<!-- publisher version -->

<dc:relation type="http://purl.org/coar/resource_type/c_6501" 
    deposit_date="2021-07-28" 
    resource_exposed_date="2021-08-03" 
    rioxx_version="http://purl.org/coar/version/c_ab4af688f83e57aa"
    pid="http://doi.org/10.1021/acs.jcim.9b00304"
    accessRightsURI="http://purl.org/coar/access_right/c_abf2"
    license="http://creativecommons.org/licenses/by/4.0/"
    format="application/pdf">
           https://eprints.gla.ac.uk/185445/1/185445.pdf 
</dc:relation>

In this example the PID defines the expression held on the puiblisher's website (i.e. the VoR) but we also declare the actionable file location for the VoR, stored within Enlighten.

The issue -- which I posed in the documentation but which we never reached in our meeting -- is the fact that this creates redundancy in dc:identifier. dc:identifier has instead been replaced by a pid attribute.

Or has it?

The schema could specify that the repository PID be declared in dc:identifier, at root level. Other PIDs relevant to the work should be specified at dc:relation level.

geo-mac commented 1 year ago

The pid at attribute level approach can easily become clumsy, however, when we attempt to model more complicated combinations of relations. For example, would it be necessary to declare it in a relation? There would need to be strict rules and business logic on when such an attribute is exposed in the metadata to make it useful. I would be interested in a take from @petrknoth on this -- how would this sit from an aggregation perspective?

<!-- 'Work' description at root level -->
<dc:identifier>https://doi.org/a-locally-minted-pid-for-an-AAM</dc:identifier>

<!-- relation to harvestable content, etc. -->
<dc:relation type="http://purl.org/coar/resource_type/c_6501" 
    deposit_date="2019-07-12" 
    resource_exposed_date="2021-01-06" 
    rioxx_version="http://purl.org/coar/version/c_ab4af688f83e57aa"
    pid="https://doi.org/a-locally-minted-pid-for-an-AAM"
    accessRightsURI="http://purl.org/coar/access_right/c_abf2"
    license_ref="http://creativecommons.org/licenses/by-nc-nd/4.0" 
    format="application/pdf">
            https://eprints.gla.ac.uk/190277/7/190277.pdf
</dc:relation>

<!-- publisher version -->
<dc:relation type="http://purl.org/coar/resource_type/c_6501" 
    rioxx_version="http://purl.org/coar/version/c_970fb48d4fbd8a85">
            http://doi.org/10.1021/acs.jcim.9b00304
</dc:relation>

<!-- dataset -->
<dc:relation type="https://schema.org/DataSet" 
    accessRightsURI="http://purl.org/coar/access_right/c_abf2" 
    format="text/csv">
            https://doi.org/10.17868/dataset_123456
</dc:relation>
MickEadie commented 1 year ago

does this not work better if the root level dc:identifier is for the langing page and not the AAM?

<!-- 'Work' description at root level -->
<dc:identifier>https://doi.org/a-locally-minted-pid-for-a-LANDING PAGE</dc:identifier>

<!-- accepted version -->

<dc:relation type="http://purl.org/coar/resource_type/c_6501" 
    deposit_date="2019-07-12" 
    resource_exposed_date="2021-01-06" 
    rioxx_version="http://purl.org/coar/version/c_ab4af688f83e57aa"
    accessRightsURI="http://purl.org/coar/access_right/c_abf2"
    license_ref="http://creativecommons.org/licenses/by-nc-nd/4.0" 
    format="application/pdf">
            https://eprints.gla.ac.uk/190277/7/190277.pdf
</dc:relation>

<!-- publisher version both the local copy and the publisher DOI modelled -->

<dc:relation type="http://purl.org/coar/resource_type/c_6501" 
    deposit_date="2021-07-28" 
    resource_exposed_date="2021-08-03" 
    rioxx_version="https://vocabularies.coar-repositories.org/version_types/c_970fb48d4fbd8a85/"
    pid="http://doi.org/10.1021/acs.jcim.9b00304"
    accessRightsURI="http://purl.org/coar/access_right/c_abf2"
    license="http://creativecommons.org/licenses/by/4.0/"
    format="application/pdf">
           https://eprints.gla.ac.uk/185445/1/185445.pdf 
</dc:relation>

<!-- dataset that also uses pid attribute of relation-->
<dc:relation type="https://schema.org/DataSet" 
    accessRightsURI="http://purl.org/coar/access_right/c_abf2" 
    format="text/csv"
    pid="https://doi.org/10.17868/dataset_123456"
</dc:relation>
geo-mac commented 1 year ago

does this not work better if the root level dc:identifier is for the langing page and not the AAM?

Yes -- apologies, semantic drift on my part. When I stated https://doi.org/a-locally-minted-pid-for-an-AAM what I actually meant https://doi.org/a-locally-minted-pid-for-a-LANDING PAGE. Doh! (Which I suppose echoes the confusion Bev had surrounding UKRI communications on this matter!) :-)

Is it OK for dc:identifier at root level to define a PID for an AAM repository deposit and for the file location of that deposit (i.e. the .pdf file) to form the content of a dc:relation instance, without any formal connection between them? Is it OK to rely on business logic for the metadata to make sense? This is why I felt we would still need to include the pid attribute so that aggregators could verify that the PID in dc:identifier essentially identifies the file content of dc:relation, especially as there could be mutliple instances of dc:relation. TBH, this is only a particular issue (I think) with AAMs and instances where institutions are minting PIDs for PlanS/UKRI purposes (accepting too there are many reasons why minting PIDs would be useful in this context). However, this will be the most common use case.

The above example needs @petrknoth to take a look I think to see if this sort of approach is disruptive in aggregation.

geo-mac commented 1 year ago

Based on the recent posts above, a full Rioxx description might look like the example below. This example uses @MickEadie's data and embellishes it (including with the a real preprint expression) in order to give an impression of how it might be modelled.

The tension between root and expression level description still exists within this example, insofar as we are treating the VoR as the centre of the bibliographic universe, rather than as an abstract work. However, perhaps this compromise is acceptable if the relationships to the expressions and related resource are adequately modelled...? And perhaps the community wouldn't swallow anything more abstract -- at least not yet?

Again, the key feature to argue about is the inclusion of the locally created PID at root level within dc:identifier and its simulataneous use within the dc:relation pid attribute. Also, I have conflated relation types between COAR and schema.org vocabularies -- but let's assume they all use COAR (I was drafting this in a hurry!)

<rioxx xsi:schemaLocation="http://www.rioxx.net/schema/v3.0/rioxx/ http://www.rioxx.net/schema/v3.0/rioxx/rioxx.xsd">
<dc:description>We conducted a field experiment to evaluate the impact of job search assistance on the employment of recently arrived refugees in Germany. The treatment group received job-matching support: an NGO identified suitable vacancies and sent the refugees’ CVs to employers. Six months after the start of the treatment, we find no evidence for positive treatment effects on employment. However, after twelve months, we detect positive treatment effects: marginally significant for the full sample and larger in magnitude and significant for lower educated refugees and those who have not yet received a refugee status. These individuals face higher uncertainty about their residence status, they do not search effectively, lack access to alternative support programmes and may be disregarded by employers due to perceived higher hiring costs. Our results suggest that personalised job search assistance can improve labour market integration of these refugee groups by alleviating labour market frictions.</dc:description>
<dc:language>en</dc:language>
<dc:publisher uri="https://isni.org/isni/0000000109440166">American Chemical Society</dc:publisher>
<dc:source>1549-9596</dc:source>
<dc:title>Intuition-enabled machine learning beats the competition when joint human-robot teams perform inorganic chemical experiments</dc:title>
<dcterms:dateAccepted>2019-04-26</dcterms:dateAccepted>
<rioxxterms:author uri="http://orcid.org/0000-0002-0144-8566">Battisti, Michele</rioxxterms:author>
<rioxxterms:author>Duros, Vasilios</rioxxterms:author>
<rioxxterms:author uri="http://orcid.org/0000-0002-2211-4389">Grizou, Jonathan</rioxxterms:author>
<rioxxterms:author uri="http://orcid.org/0000-0002-5222-9611">Sharma, Abhishek</rioxxterms:author>
<rioxxterms:author>Mehr, S. Hessam M.</rioxxterms:author>
<rioxxterms:author>Bubliauskas, Andrius</rioxxterms:author>
<rioxxterms:author>Frei, Przemyslaw</rioxxterms:author>
<rioxxterms:author uri="http://orcid.org/0000-0002-0086-5173">Miras, Haralampos N.</rioxxterms:author>
<rioxxterms:author uri="http://orcid.org/0000-0001-8035-5757">Cronin, Leroy</rioxxterms:author>
<rioxxterms:publication_date>2019-06-24</rioxxterms:publication_date>
<rioxxterms:record_public_release_date>2019-04-26</rioxxterms:record_public_release_date>
<rioxxterms:type uri="https://vocabularies.coar-repositories.org/resource_types/c_2df8fbb1/">research article</rioxxterms:type>
<rioxxterms:grant
    funder_name="Arts and Humanities Research Council"
    funder_id="https://ror.org/0505m1554">
    AH/W007622/1
</rioxxterms:grant>
<rioxxterms:grant
    funder_name="Wellcome Trust"
    funder_id="https://isni.org/isni/0000000404277672">
    https://doi.org/10.35802/218671
</rioxxterms:grant>
<rioxxterms:project>
    https://handle.net/10378.1/1590366
</rioxxterms:project>

<!-- 'Work-esque' description at root level -->
<dc:identifier>https://doi.org/a-locally-minted-pid-for-an-AAM</dc:identifier>

<!-- relation to 'expression' of harvestable content, etc. -->
<dc:relation type="http://purl.org/coar/resource_type/c_6501" 
    deposit_date="2019-07-12" 
    resource_exposed_date="2021-01-06" 
    rioxx_version="http://purl.org/coar/version/c_ab4af688f83e57aa"
    pid="https://doi.org/a-locally-minted-pid-for-an-AAM"
    accessRightsURI="http://purl.org/coar/access_right/c_abf2"
    license_ref="http://creativecommons.org/licenses/by-nc-nd/4.0" 
    format="application/pdf">
            https://eprints.gla.ac.uk/190277/7/190277.pdf
</dc:relation>

<!-- Other expressions - publisher version -->
<dc:relation type="http://purl.org/coar/resource_type/c_6501" 
    rioxx_version="http://purl.org/coar/version/c_970fb48d4fbd8a85">
            http://doi.org/10.1021/acs.jcim.9b00304
</dc:relation>

<!-- Other expressions - preprint (author's original - JAV) -->
<dc:relation type="http://purl.org/coar/resource_type/c_816b" 
    rioxx_version="http://purl.org/coar/version/c_b1a7d7d4d402bcce">
            https://doi.org/10.26434/chemrxiv.7712453.v1
</dc:relation>

<!-- related  dataset -->
<dc:relation type="https://schema.org/DataSet" 
    accessRightsURI="http://purl.org/coar/access_right/c_abf2" 
    format="text/csv">
            https://doi.org/10.17868/dataset_123456
</dc:relation>
geo-mac commented 1 year ago

Following on from the previous example, here is an instance in which an AAM is not being exposed via the repository and instead a Gold VoR. This VoR is deposited in the repository and therefore no PID need be minted for a local AAM deposit. This example uses real data from Strathprints and includes two related datasets and a piece of related software on GitHub. I have used the correct COAR URIs this time. ;-)

<rioxx xsi:schemaLocation="http://www.rioxx.net/schema/v3.0/rioxx/ http://www.rioxx.net/schema/v3.0/rioxx/rioxx.xsd">
<dc:description>The phenology, distribution, and size composition of plankton communities are changing rapidly in response to warming. This may lead to shifts in the prey fields of planktivorous fish, which play a key role in transferring energy up marine food chains. Here, we use 60 + years of Continuous Plankton Recorder data to explore temporal trends in key taxa and community traits in the prey field of planktivorous lesser sandeels (Ammodytes marinus) in the North Sea, the Faroes and southern Iceland. We found marked spatial variation in the prey field, with Calanus copepods generally being much more common in the northern part of the study area. In the western North Sea, the estimated amount of available energy in the prey field has decreased by more than 50% since the 1960s. This decrease was accompanied by declining abundances of small copepods, and shifts in the timing of peak annual prey abundances. Further, the estimated average prey community body size has increased in several of the locations considered. Overall, our results point to the importance of regional studies of prey fields, and caution against inferring ecological consequences based only on large-scale trends in key taxa or mean community traits.</dc:description>
<dc:language>en</dc:language>
<dc:publisher uri="https://isni.org/isni/0000000122929185">Oxford University Press</dc:publisher>
<dc:source>1054-3139</dc:source>
<dc:title>Spatio-temporal variation in the zooplankton prey of lesser sandeels : species and community trait patterns from the continuous plankton recorder</dc:title>
<dcterms:dateAccepted>2022-05-12</dcterms:dateAccepted>
<rioxxterms:author uri="https://orcid.org/0000-0002-8508-3911">Olin, Agnes B.</rioxxterms:author>
<rioxxterms:author uri="https://orcid.org/0000-0002-1892-9497">Banas, Neil S.</rioxxterms:author>
<rioxxterms:author>Johns, David G.</rioxxterms:author>
<rioxxterms:author uri="https://orcid.org/0000-0001-6602-3107">Heath, Michael R.</rioxxterms:author>
<rioxxterms:author>Wright, Peter J.</rioxxterms:author>
<rioxxterms:author>Nager, Ruedi G.</rioxxterms:author>
<rioxxterms:publication_date>2022-06-29</rioxxterms:publication_date>
<rioxxterms:record_public_release_date>2022-06-22</rioxxterms:record_public_release_date>
<rioxxterms:type uri="https://vocabularies.coar-repositories.org/resource_types/c_2df8fbb1/">research article</rioxxterms:type>
<rioxxterms:grant
    funder_name="Natural Environment Research Council"
    funder_id="https://ror.org/02b5d8509">
    NE/L003090/1
</rioxxterms:grant>

<!-- 'Work-esque' description at root level - describing Gold VoR in repository so no requirement for PID of AMM. Conventional repo handle communicated in dc:identifier. PID for VoR communicated in dc:relation -->
<dc:identifier>https://strathprints.strath.ac.uk/81232/</dc:identifier>

<!-- relation to 'expression' of harvestable content, etc. -->
<dc:relation type="http://purl.org/coar/resource_type/c_6501" 
    deposit_date="2022-06-30" 
    resource_exposed_date="2022-06-30" 
    rioxx_version="https://purl.org/coar/version/c_970fb48d4fbd8a85"
    pid="http://doi.org/10.1021/acs.jcim.9b00304"
    accessRightsURI="http://purl.org/coar/access_right/c_abf2"
    license_ref="http://creativecommons.org/licenses/by/4.0" 
    format="application/pdf">https://strathprints.strath.ac.uk/81232/1/Olin_etal_ICES_JMS_2022_Spatio_temporal_variation_in_the_zooplankton_prey.pdf
</dc:relation>

<!-- related  dataset -->
<dc:relation type="http://purl.org/coar/resource_type/FF4C-28RK" 
    accessRightsURI="http://purl.org/coar/access_right/c_abf2" 
    format="text/csv">
            https://doi.org/10.17031/1673
</dc:relation>

<!-- related  dataset -->
<dc:relation type="http://purl.org/coar/resource_type/FF4C-28RK" 
    accessRightsURI="http://purl.org/coar/access_right/c_abf2" 
    format="text/csv">
            https://doi.org/10.7489/610-1
</dc:relation>

<!-- related  software -->
<dc:relation type="http://purl.org/coar/resource_type/c_c950" 
    accessRightsURI="http://purl.org/coar/access_right/c_abf2" 
    format="text/csv">
            https://github.com/agnesolin/CPRsandeel
</dc:relation>
petrknoth commented 1 year ago

Replying to this comment https://github.com/geo-mac/Rioxx-development/issues/1#issuecomment-1460305555 in relaation to dc:identifier at root level.

I feel that the use of dc:identifier at root level is important and that it should contain a PID identifying the metadata record we are talking about, as opposed to a PID of any of the resources that might be included under dc:relation. An acceptable PID in this case, might be an OAI identifier, DOI, maybe handle? Currently, we require this identifier to be an HTTP(S) URI.

Example:

<dc:identifier>https://oai.core.ac.uk/oai:researchonline.rca.ac.uk:1035</dc:identifier>

In fact, I would be supportive of relaxing the requirement for this to be an HTTP(S) URI to just a URI so that this would also be acceptable:

<dc:identifier>oai:researchonline.rca.ac.uk:1035</dc:identifier>
geo-mac commented 1 year ago

Thanks @petrknoth -- what are your thoughts on instances where dc:identifier uses a PID at root level to identify, say, an AAM and then uses the same PID within dc:relation alongside the harvestable resource URI? As in the screen snippet below. It seems necessary to link them -- although it breaks the root-expression model to some extent.

image

petrknoth commented 1 year ago

Hi @geo-mac , I feel that the above example shows a practice that is substandard and not something we want to encourage. The root level identifier specified in dc:identifier should (in this "works" approach) only identify and resolve to the metadata record that we hold in this repository, i.e. it should resolve to the repository splash page. On the other hand, the identifier that one can use at the dc:relation level should resolve to the object identified under dc:relation, which in this case is the locally stored PDF. In my view, our approach is clean as long as we distinguish the identifier of the metadata record from the identifier of the object itself. What do you think?

petrknoth commented 1 year ago

Responding to this comment: https://github.com/geo-mac/Rioxx-development/issues/1#issuecomment-1467911590

This is, in my view a nice example. To make this work well for harvesting software, it requires two things:

  1. The schema.org type attribute (I noticed that you are using the COAR vocabulary, but we are using the less granular schema.org here: https://www.rioxx.net/profiles/v3-0-rc-1/#dc:relation ) to distinguish datasets from papers.
  2. Harvesting software would very much benefit from an attribute describing whether a copy of the related resource is actually archived in the repository or if the link is external. For instance, when the URL is a DOI, there is no way for a robot to recognise this prior to resolving that URL. This information is very important in order to be able to respect robots.txt and other limitations. I feel we need an attribute stating location="external" or location="internal" (perhaps with different names and maybe the internal could be implied unless external is specified). I think we discussed this some time ago, but not sure where it went.

It would make a lot of sense to think if we could potentially make use of the signposting principles: https://signposting.org/ although they are typically operated at the HTTP headers level.

petrknoth commented 1 year ago

Responding to: https://github.com/geo-mac/Rioxx-development/issues/1#issuecomment-1463693622 @MickEadie I agree with the need to distinguish identifiers for the metadata record (which should resolve to the landing page) from identifiers for the objects included in dc:relation.

Note that the identifier for the metadata record does not necessarily need to be a DOI. An OAI identifier or a Handle should be perfectly permissible.

petrknoth commented 1 year ago

Responding to this comment: https://github.com/geo-mac/Rioxx-development/issues/1#issuecomment-1462113222 and also related to my earlier comment: https://github.com/geo-mac/Rioxx-development/issues/1#issuecomment-1468759039

One thing that just occurred to me is that when deposit_date and/or resource_exposed_date are specified, then this probably implies that the resource is local. I am not yet completely convinced, but I wonder whether by making e.g. deposit_date mandatory for local copies of content, we could avoid having to create the "location" attribute I proposed under: https://github.com/geo-mac/Rioxx-development/issues/1#issuecomment-1468759039

petrknoth commented 1 year ago

Responding to this comment: https://github.com/geo-mac/Rioxx-development/issues/1#issuecomment-1462113222

@geo-mac you raise the issue of not being able to describe that the local copy of the article and the external version on the publisher's system are the same item. If they are the same item, I would argue that this can be done by specifying the same pid attribute for both of them, for instance, giving the same DOI.

Having said that, I feel we should listen to what John S. said about the ability to specify multiple PIDs (e.g. one might want to specify a DOI and a handle at the same time). I would really try to address this and we can do that by representing these PIDs as elements rather than attributes. Elements are generally more powerful anyway and will improve our ability to extend RIOXX in the future. What do you think?

MickEadie commented 1 year ago

Just getting back to this after various strikes etc! I have edited Georges's example from above

'Work-esque' description at root level which has dc:identifier that is just the splash page url

<dc:identifier>https://strathprints.strath.ac.uk/81232/</dc:identifier>

the record goes on to model Gold VoR in repository which has a publisher pid and a local copy of the same resource in dc:relation.

Is this the John S approach @petrknoth ? The VOR expression below now has 2 PID elements added to a single pid attribute separated by whitespace.

I also wonder if 'deposit_date' and 'resource_exposed_date' are correct here - is this not information that can only be provided by the 'publisher' of the resource?

<dc:relation type="http://purl.org/coar/resource_type/c_6501" 
    deposit_date="2022-06-30" 
    resource_exposed_date="2022-06-30" 
    rioxx_version="https://purl.org/coar/version/c_970fb48d4fbd8a85"
    pid="http://doi.org/10.1021/acs.jcim.9b00304 https://strathprints.strath.ac.uk/81232/1/Olin_etal_ICES_JMS_2022_Spatio_temporal_variation_in_the_zooplankton_prey.pdf"
    accessRightsURI="http://purl.org/coar/access_right/c_abf2"
    license_ref="http://creativecommons.org/licenses/by/4.0" 
    format="application/pdf">
</dc:relation>

The accepted version

In this expression we can have the 'deposit_date' and 'resource_exposed_date' as they are under our repository's control

Is a 'deposit_date' enough to infer it is a 'local' resource?

<dc:relation type="http://purl.org/coar/resource_type/c_6501" 
    deposit_date="2019-07-12" 
    resource_exposed_date="2021-01-06" 
    rioxx_version="http://purl.org/coar/version/c_ab4af688f83e57aa"
    accessRightsURI="http://purl.org/coar/access_right/c_abf2"
    license_ref="http://creativecommons.org/licenses/by-nc-nd/4.0" 
    format="application/pdf">
            https://eprints.gla.ac.uk/190277/7/190277.pdf
</dc:relation>
<!-- related  dataset -->
<dc:relation type="http://purl.org/coar/resource_type/FF4C-28RK" 
    accessRightsURI="http://purl.org/coar/access_right/c_abf2" 
    format="text/csv">
            https://doi.org/10.17031/1673
</dc:relation>
geo-mac commented 1 year ago

You contribution was apposite @MickEadie -- was planning to return to this over the next couple of days.

I think it might be necessary to collate comments from here and present something more coherent in order to present our collective thinking. I'll try to do this soon as confusion is increasingly being introduced. For clarity, when we use words like, "create a PID for an AAM on a repository", I think we are all meaning a PID pointing to a repository splash page, rather than to an actionable file -- which is what I was meaning in previous examples.

@petrknoth said:

you raise the issue of not being able to describe that the local copy of the article and the external version on the publisher's system are the same item.If they are the same item, I would argue that this can be done by specifying the same pid attribute for both of them, for instance, giving the same DOI.

Yes, this is exactly what I was thinking.

@MickEadie said:

I also wonder if 'deposit_date' and 'resource_exposed_date' are correct here - is this not information that can only be provided by the 'publisher' of the resource?

In general, yes, this is the case. However, I believe in this case it was because, in this example, we were discussing the prospect of making a repository deposit for a Gold VoR, which would obviously be provided by the publisher but for which we would have local deposit timestamps, etc.

On a more substantive point, I think we are tacitly acknowledging that the VoR, to a certain extent, remains a focus for metadata description, rather than expressing the work at an abstract level at 'root' level, and using dc:relation to express expressions.

MickEadie commented 1 year ago

@petrknoth said:

you raise the issue of not being able to describe that the local copy of the article and the external version on the publisher's system are the same item.If they are the same item, I would argue that this can be done by specifying the same pid attribute for both of them, for instance, giving the same DOI.

Yes, this is exactly what I was thinking.

ah ok - does this mean we will have 2 'VOR' relations then, but both use the same publisher DOI as the pid attribute? And following that logic only one of those VOR relations (the local repsository copy) will have a deposit_date and resource_exposed_date

MickEadie commented 1 year ago

For clarity, when we use words like, "create a PID for an AAM on a repository", I think we are all meaning a PID pointing to a repository splash page, rather than to an actionable file -- which is what I was meaning in previous examples.

yes for me it is the splash page

MickEadie commented 1 year ago

I think it might be necessary to collate comments from here and present something more coherent in order to present our collective thinking. I'll try to do this soon as confusion is increasingly being introduced.

thanks @geo-mac - these discussions are helpful for me at least - i hope I'm not adding to confusion :smile: also happy to assist in putting together an example record we can use to illustrate if that would help

geo-mac commented 1 year ago

i hope I'm not adding to confusion 😄

Not at all!

I'm increasingly of the view that the schema, when finally published, will need to include some example records, each relating to specific use cases -- just to help readers, repo managers, scholarly communication officers, and developers understand the schema.

petrknoth commented 1 year ago

Hi @MickEadie and @geo-mac . I feel we are quite close now. I suggest that the three of us have a chat at this point and maybe we can then agree how to translate this discussion into proposed amends to the specification. What do you think?

geo-mac commented 1 year ago

Hey @petrknoth -- I have actually just pulled a few working examples together. Let me publish these somewhere on my GitHub repo, and then we can use these for the ensuring discussion...? Sound like a plan?!

petrknoth commented 1 year ago

Yes, I agree.

geo-mac commented 1 year ago

Actually, for technical reason (too boring to explain) it will be Thursday before I can publish these examples. But it will definitely be Thursday. So, I'll circulate a Doodle poll for a discussion late next week, if that might work?