bodleian / ora_data_model

Documentation and crosswalks relating to the ORA data model
1 stars 1 forks source link

Sample data from the People Data Feed and Open Access Monitor #10

Closed tomwrobel closed 5 years ago

tomwrobel commented 5 years ago

Sample PUBS-ATOM files that I have seen so far do not include data from the people data feed or the open access monitor.

As a result, I can't assess whether or not these sources contain metadata that will be hard to map.

Would it be possible to provide this in the sample API XML files (or however that data will be fed to the crosswalks?

AndrewBennet commented 5 years ago

I'm not quite sure what you mean by data from the open access monitor - could you clarify?

And by people data feed, do you mean the metadata of the users linked to the publication? If so, that is present in the current single sample file: https://github.com/tomwrobel/ora_data_model/blob/master/elements_xwalks/xwalk-out/input-publication-9598.xml#L213

I will generate some more examples later today and upload them 👍

jjpartridge commented 5 years ago

Hi Andrew,

People Data Feed is our reference for the HR User Feed - e.g. user information, this is a BizTalk connection with Research Services and IT Services, which populates the user table.

As for the Open Access Monitor (OAM), the question is whether there is any data from the OAM module that is likely to be fed back to the repository (e.g. compliance or REF exception information), or whether the OAM is reading information only from the data in Elements and will not pass anything onto the repository.

Hope that makes sense - Tom correct me if i'm wrong!

AndrewBennet commented 5 years ago

Thanks - that's good to know.

The HR User information for linked users is available - see https://github.com/tomwrobel/ora_data_model/blob/master/elements_xwalks/xwalk-out/input-publication-9598.xml#L235

There is some OA info (policy exceptions, library statuses) included in the XML that the xwalks work with. Compliance info, however, isn't exposed in this. Trying to crosswalk the compliance status wouldn't be recommended anyway: it depends on the current date, as well as the presence of records in external systems, so the value is likely to be incorrect very quickly.

Below is a sample of the OA data available in the input XML:

<api:oa>
  <api:oa-policy-exception-type>Tech3</api:oa-policy-exception-type>
  <api:oa-policy-exception-comment>Hyrax deposit has not yet been developed!</api:oa-policy-exception-comment>
  <api:oa-policy-exceptions>
    <api:oa-policy-exception>
      <api:type>Tech3</api:type>
      <api:comment>Hyrax deposit has not yet been developed!</api:comment>
    </api:oa-policy-exception>
  </api:oa-policy-exceptions>
  <api:library-status status="full-text-received" status-display-name="Full text received">
    <api:last-updated-when>2019-02-06T17:01:58.59+00:00</api:last-updated-when>
    <api:note>Library status changed to 'Full text received' on 06/02/2019 by Daniel Hook.</api:note>
  </api:library-status>
</api:oa>
tomwrobel commented 5 years ago

I've added crosswalk documentation for People from 83749. I'll look at open access mapping when I get to 9598.

The new crosswalk documentation represents a substantial update, so is worth looking at!

tomwrobel commented 5 years ago

@jjpartridge Could you look through the sample OA monitor output above and in https://github.com/tomwrobel/ora_data_model/blob/master/elements_xwalks/xwalk-out/input-publication-9598.xml#L235 and let me know if there's anything here we want to crosswalk into the data model. I'm not entirely sure what all of these fields do, and I don't have an existing crosswalk to fall back on for information!

jjpartridge commented 5 years ago
<api:oa>
     <api:oa-policy-exception-type>Tech3</api:oa-policy-exception-type>
          <!-- This is the REF exception - controlled list | ora:field - "ref_exception_required" -->
            <api:oa-policy-exception-comment>Hyrax deposit has not yet been developed!</api:oa-policy-exception-comment>
          <!-- This is the REF exception note - free text | we don't have a note other than "ref_other_exception_note" which should probably be added where it exists -->
      <api:oa-policy-exceptions>
        <api:oa-policy-exception>
          <api:type>Tech3</api:type>
          <api:comment>Hyrax deposit has not yet been developed!</api:comment>
          <!-- Not sure how these link up with the above, but same thing -->
        </api:oa-policy-exception>
      </api:oa-policy-exceptions>
      <api:library-status status="full-text-received" status-display-name="Full text received">
        <api:last-updated-when>2019-02-06T17:01:58.59+00:00</api:last-updated-when>
        <api:note>Library status changed to 'Full text received' on 06/02/2019 by Daniel Hook.
</api:note>
      </api:library-status>
       <!-- Library status is not configured on in our installation of Elements, however I think it's a useful thing, don't know what controsl are on it, don't beleiev we have anything to map to it, however if "full text received" is a part of controlled list then this could be good to map with were we have a binary file attached to a record (some are sent directly to us) and the "note" to be populated with our "deposit date". The question would be whether @eugeniobarrio  would want to implement this at any point --> 
    </api:oa>
    <api:neighbourhood>
      <api:relationships>
        <api:relationship id="9667" type-id="8" type="publication-user-authorship"
        <!-- Question for @AndrewBennet , are there similar relationships avaialble "publication-publication" and "publication-grant" -->
tomwrobel commented 5 years ago

@jjpartridge shall we discuss what fields these would map to tomorrow?

jjpartridge commented 5 years ago

Thanks for fixing the edit above. Yes we can add to discussion tomorrow. Not sure it is coming through very well, but there are two questions in the above, one for @AndrewBennet and one for @eugeniobarrio

jjpartridge commented 5 years ago

@AndrewBennet actually I see example of grant info in : https://github.com/tomwrobel/ora_data_model/blob/master/elements_xwalks/xwalk-out/input-publication-83749.xml so that answers part of the question, do you have an example of publication to publication relationships?

AndrewBennet commented 5 years ago

@jjpartridge - yes, there are some publication-publication link types available. The names of these link types between publications and publications/grants are as follows:

Name LinkType InverseLinkType
publication-publication-derivative Derivative of Derives
publication-publication-supersedence Supersedes Is superseded by
publication-publication-supplement Supplements Is supplemented by
publication-publication-correction Corrects Is corrected by
publication-grant-funded Funded by Funds

However, the availability of these link types can be controlled. I believe that by default the only publication-publication link available to users is the Derivative of / Derives link type.

The representation of this in in the API serialisation would be very similar to how the linked grant object is serialiased (just with publication object instead)

tomwrobel commented 5 years ago

@AndrewBennet do you have examples of the publication-publication relationship in a record? I can write mappings (to related_items)

AndrewBennet commented 5 years ago

I will prepare some an upload them shortly

AndrewBennet commented 5 years ago

I've pushed an example of a publication with several publication-publication relationships (https://github.com/tomwrobel/ora_data_model/blob/master/elements_xwalks/xwalk-out/input-publication-86457.xml)

tomwrobel commented 5 years ago

@AndrewBennet I've created a new METS file with that record, and I've updated the mappings xml file to include new mappings.

We're a bit concerned by what happens when a related publication has no DOI.

In ORA, and in the data model, we deal with this by generating a human-readable citation for the related work. This is placed in related_items__related_item_citation.

Looking at Symplectic Elements, there are a number of places where a citation string is generated for a PUBS item. Would it be possible to include a citation string here for mapping?

jjpartridge commented 5 years ago

@AndrewBennet @tomwrobel Apologies for adding to a closed ticket, but thinking about testing and pushing values back to SE. With regards to exceptions in this instance, would the value that SE is expecting to see be: "Access1"; "Deposit1", "Other"; "Tech1" etc.?

Where as in ORA these are currently recorded as the text values of the exceptions, e.g. "Publisher embargo too long" or "Publisher does not allow OA".

Do these values need to be mapped so that they can be understood by the crosswalk? I assume for testing we should look to send the SE values in order to see a change within OA Monitor for example?

Thanks, Jason