Create Fixing Crossref failures page

naushinthomson commented 4 years ago

[x] Create page
[x] Add description of what is being covered
[x] Add description of checks required on this content, compiled from current protocol documents
[x] Add examples covering correct and incorrect scenarios, as required to support checks
[x] Review house style document to ensure all requirements are covered by GitBook and schematron
[x] Becky approved page
[x] Melissa approved page
[x] Fred approved page
[x] James approved page
[x] Naushin approved page

Definition of done

[x] Page completed
[x] Team reviewed and approved page?

bcollins14 commented 4 years ago

@naushinthomson @JGilbert-eLife @FAtherden-eLife @Melissa37

I have also put together this page https://app.gitbook.com/@elifesciences/s/productionhowto/pages-in-progress/fixing-crossref-dryad-failures. I would really appreciate help with the finer details of Crossref and if I have not used the correct terms.

Melissa37 commented 4 years ago

How to fix Crossref/Dryad failures in the inbox

Suggest changing to:

How to fix Crossref/Dryad failures that result in emails to the inbox

Melissa37 commented 4 years ago

Crossref is most commonly know for providing Digital Object Identifiers (DOIs) for research outputs making them easier to locate, cite, and more. DOIs allow the reader to follow a stable link straight to the content even when a website has changed. Each publisher will have their own ID in the DOI and this is followed by a slash and a string of numbers that are unique to the content. Here is an example of an eLife DOI: 10.7554/eLife.58603

Suggest changing to:

Crossref is most commonly known for providing Digital Object Identifiers (DOIs) for research outputs, making them easier to locate and cite. DOIs allow the reader to follow a stable link straight to the content even when a website has changed. Crossref DOIs begin with '10.', dollowed by a publisher's unique 4 digit number and a slash (eLife's is 7554/). Following that each publisher assigns a unique set of characters to each DOI they publish, eLife uses eLife.XXXXX where XXXXX is the manuscript number assigned an article in eJP. For example: 10.7554/eLife.58603

Melissa37 commented 4 years ago

At eLife, when an article is PoAed, the DOI is submitted automatically to Crossref and once processed, it will be registered with them. This is also triggered when a VOR article is ready for publication and sent to Continuum. If it has a date in the future ,say for press, the DOI will then be registered on the day of publication.

Suggest changing to:

When an article is published in accepted manuscript form (PoA), the DOI is submitted automatically to Crossref and, once processed, it will be registered. This process is also triggered when a VoR article is sent to Continuum. If the article has been PoA'd, the metadata deposited with Crossref is enhanced with the full-text information available from the final VoR publication. Each time an article is published the metadata is sent to Crossref so any mistakes are overwritten and any additional information published is added to the Crossref record. When content is sent to Continuum with a future date, for instance press content, the DOI will be registered at midnight the day before publication.

Melissa37 commented 4 years ago

Production receive Crossref emails for both PoA and VOR articles that are sent for publication on Continuum.

'Production' is singular so receive should be 'receives'

Melissa37 commented 4 years ago

If any failures are counted, you will need to check the email which will outline where the failure has occurred.

Suggest changing to:

If any failures are listed, you will need to check the rest of the content of the email (above), which will outline where the failure has occurred.

Melissa37 commented 4 years ago

In the case above, the DOI for the Dryad dataset was not live which has caused the failure. This is one of our most common failures.

Suggest changing to:

When sending VoR content metadata to Crossref we list all citations within the article. In the case above, the DOI for the Dryad dataset that was referenced in the article was not live yet when we sent the metadata to Crossref. As the Dryad DOIs are minted by Crossref too, their system automatically checks whether the cited DOI exists in their system and if it does not, fails our submission. This is one of our most common failures. It is a Catch-22, Dryad do not release datasets to view and hence register their DOIs until the article which they are about is published. We submit Crossref metadata just before publication. If the article has been PoA'd we have not sent the citation details to Crossref by Dryad can know the article their dataset is linked to is published.

Melissa37 commented 4 years ago

Go to the Crossref admin page: https://doi.crossref.org/servlet/useragent?func=showHome

There is an extra step - log in.

We won't hold the login details on this Gitbook, but we need to be clear there is a log in process and we in production have the username and password

Melissa37 commented 4 years ago

Search with no restrictions, which will bring up the list of submissions processed by Crossref. Recent errors will be highlighted with a red ‘E’ symbol:

Suggest changing to:

Search for the DOI number with no restrictions, which will bring up the list of submissions processed by Crossref. Recent errors will be highlighted with a red ‘E’ symbol:

Melissa37 commented 4 years ago

Important note: To prevent further failures, both the doi_batch_id elements and timestamp need to be edited as Crossref will reject any file that has the same batch id and timestamp as a previously submitted file. So to avoid this, you should edit the time on the doi_batch_id you will use as the file name to reflect the updated time you will change in the XML. In this instance, the doi_batch_id was changed to 'elife-crossref-57093-20200709160008.xml'. Now open the XML file using your XML editor, such as Oxygen. As mentioned above, the timestamp and doi_batch_id elements need to be updated so do this first. You will find these at the top of the XML. Change these to a future time, for best practice up to an hour ahead of when you are correcting this failure, so here we changed both to 'elife-crossref-57093-20200709160008'.

Suggest changing to:

Rename the file. The file name is "elife-crossref-XXXXX-XXXXXXXXXXXXXX.xml". The first set of X refer to the MS number. Leave this and the prefix as is. The first set of X are a timestamp (year, 2 digit month, 2 digit day, 24 hour four digit time and 2 digit seconds). Change this to a future time (for instance 1 hour in the future), for example: 20200709160008 Crossref registration will fail if the date is in the past. Now open the XML file using your XML editor, such as Oxygen. As per above, the timestamp and doi_batch_id elements need to be updated to reflexct the file name.

Melissa37 commented 4 years ago

Alternatively, if there is more than one , only delete the one that contains the problematic DOI ( . . . </rel:related_item>).

Suggest changing to:

If there is more than one Dryad DOI contained in , only delete the one that contains the problematic DOI ( . . . </rel:related_item>). Do not delete any others

Melissa37 commented 4 years ago

Love the pause for breath!

:-)

Melissa37 commented 4 years ago

This has now been resolved. Unfortunately, this does mean that we will lose the Dryad information but at the moment, there is nothing we can do about this.

There is something we could do about this - record the article, go back to it a week later, check the Dryad DOI is registered and do a silent correction to resend the metadata to Crossref. We choose not to do this!

So maybe say

This has now been resolved. Unfortunately, this does mean that the citation to the Dryad dataset will not be registered with Crossref but a manual workaround is not desirable.

fred-atherden commented 4 years ago

Would it be useful to have an automated way of fixing these? I'm thinking, we download the file, open it in oXygen, run an XSLT and then re-upload it to Crossref.

Should be really simple to do.

Once you've set up the scenario in oXygen, it would be a case of just clicking two buttons to get the updated file.

Melissa37 commented 4 years ago

Would it be useful to have an automated way of fixing these? I'm thinking, we download the file, open it in oXygen, run an XSLT and then re-upload it to Crossref.

Nice!!

fred-atherden commented 4 years ago

OK try this: Setup transformation scenario (this only needs to be done once)

Download the attached zip, unzip it (should be a file called crossref.xsl) and place it somewhere on your local machine.
Open any XML file in oXygen
In oXygen's top toolbar, click configure transformation Scenarios (or press cmd + shift + t)
Click 'New' -> XML transformation with XSLT
Rename the scenario something appropriate, like crossref
Click the folder for XSL URL, and navigate to where your local version of crossref.xsl was placed
Click 'Output'
Click 'Save as', and in the field next to it paste the following: elife-crossref-${xpath_eval(substring-after(doc('${rootMapURL}')//publisher_item/item_number[@item_number_type="article_number"]/text(),'e'))}-${xpath_eval(format-dateTime(current-dateTime() + xs:dayTimeDuration('PT30M'), '[Y0001][M01][D01][H01][m01][s01]'))}.xml
Click 'OK', and then Click 'Save and close'

Running the transform The next time you need to update a Crossref file:

Download it (it can be named anything so don't worry about that).
Open in oXygen. In the top toolbar, click configure transformation Scenarios (or press cmd + shift + t)
Tick the scenario you setup (see above)
Click 'Apply associated (1)'.

The new file should then be output in the same directory with the right name, timestamp etc.

Melissa37 commented 4 years ago

@FAtherden-eLife how does it know not to pull out all Dryad dataset refs and not just the one associated with this publication?

fred-atherden commented 4 years ago

@Melissa37, good point - it didn't previously but I have edited it now so that it does.

In our XML, as you know, previously published datasets have a specific-use="references", and generated ones have specific-use="isSupplementedBy".

These are carried over into the corssref deposits in the rel:inter_work_relation element. Generated ones will have the attribute relationship-type="isSupplementedBy", whereas previously published ones will have relationship-type="references".

So this transform (now) only picks out the rel:related-items with a <rel:inter_work_relation relationship-type="isSupplementedBy"> containing a dryad doi.

For example, here for 54474:

<rel:program>
    <rel:related_item>
        <rel:description>Dryad Digital Repository</rel:description>
        <rel:inter_work_relation identifier-type="doi" relationship-type="isSupplementedBy"
            >10.5061/dryad.1c59zw3rs</rel:inter_work_relation>
    </rel:related_item>
</rel:program>

The rel:program would be stripped.

Whereas here for 46331:

<rel:program>
    <rel:related_item>
        <rel:description>figshare</rel:description>
        <rel:inter_work_relation identifier-type="doi" relationship-type="isSupplementedBy"
            >10.6084/m9.figshare.7268558</rel:inter_work_relation>
    </rel:related_item>
    <rel:related_item>
        <rel:description>Dryad Digital Repository</rel:description>
        <rel:inter_work_relation identifier-type="doi" relationship-type="references"
            >10.5061/dryad.tb542</rel:inter_work_relation>
    </rel:related_item>
    <rel:related_item>
        <rel:description>figshare</rel:description>
        <rel:inter_work_relation identifier-type="doi" relationship-type="references"
            >10.6084/m9.figshare.4300043</rel:inter_work_relation>
    </rel:related_item>
    <rel:related_item>
        <rel:description>figshare</rel:description>
        <rel:inter_work_relation identifier-type="doi" relationship-type="references"
            >10.6084/m9.figshare.4806559</rel:inter_work_relation>
    </rel:related_item>
    <rel:related_item>
        <rel:description>figshare</rel:description>
        <rel:inter_work_relation identifier-type="doi" relationship-type="references"
            >10.6084/m9.figshare.4806562</rel:inter_work_relation>
    </rel:related_item>
</rel:program>

Nothing would be changed (and it wouldn't fail in the first place, I just include it as an example).

I don't think any more sophisticated logic is required since the failure email would be providing the context which we can base many assumptions on.

Of course, this would only work for Dryad failures, so if there were another DOI causing a failure, then the solution would have to be manual one. But imagine that doing this in the first place would resolve the vast majority of cases.

Melissa37 commented 4 years ago

Nice, thanks @FAtherden-eLife !!

Melissa37 commented 4 years ago

So Now @bcollins14 needs to re-write the page? ;-)

JGilbert-eLife commented 4 years ago

@FAtherden-eLife Does this work for PoA CrossRef submissions as well?

Melissa37 commented 4 years ago

Good point @JGilbert-eLife I'd mistakenly assumed above that this does not affect PoA, however we include the datasets in the PoA export don't we? If we do we could consider turning them off as they are not strictly necessary.

fred-atherden commented 4 years ago

@FAtherden-eLife Does this work for PoA CrossRef submissions as well?

Good point @JGilbert-eLife I'd mistakenly assumed above that this does not affect PoA, however we include the datasets in the PoA export don't we? If we do we could consider turning them off as they are not strictly necessary.

The crossref deposits for VoR and PoA are the same with respect to datatsets, (generated ones are captured using rel:related_item/rel:inter_work_relation[@relationship-type="isSupplementedBy"]), so this will work for both PoA and VoR. Aside from dryad rel:related_items, the timestamp, doi_batch_id, and filename, everything else remains the same, so even if there are differences between VoR and PoA, these would be retained.

bcollins14 commented 4 years ago

@JGilbert-eLife @Melissa37 @naushinthomson @FAtherden-eLife I have also updated this page too! https://app.gitbook.com/@elifesciences/s/productionhowto/toolkit/fixing-crossref-dryad-failures

fred-atherden commented 4 years ago

Thanks Becky - given our conversation earlier, please can we add the following (between point 5 and the current point 6) in the setting up the transformation bit:

Ensure that one of the following is selected for the 'Transformer':
- Saxon-PE
- Saxon-HE
- Saxon-EE

(Saxon-PE is preferable if it is an option)

Melissa37 commented 4 years ago

Crossref is most commonly known for providing Digital Object Identifiers (DOIs) for research outputs

Please change to:

Crossref is most commonly known for registering Digital Object Identifiers (DOIs) for research outputs

Melissa37 commented 4 years ago

DOIs allow the reader to follow a stable link straight to the content even when a website has changed.

Please change to

DOIs allow the reader to follow a stable link straight to the content even when a url has changed.

Melissa37 commented 4 years ago

When sending VoR content metadata to Crossref we list all citations within the article. In the case above, the DOI for the Dryad dataset that was referenced in the article was not live yet when we sent the metadata to Crossref. As the Dryad DOIs are minted by Crossref too, their system automatically checks whether the cited DOI exists in their system and if it does not, fails our submission. This is one of our most common failures. It is a Catch-22, Dryad do not release datasets to view and hence register their DOIs until the article which they are about is published. We submit Crossref metadata just before publication. If the article has been PoA'd we have not sent the citation details to Crossref so Dryad can know the article their dataset is linked to is published.

Is this a bit wrong now we've established Data references are sent to Crossref on PoA too? @JGilbert-eLife

Melissa37 commented 4 years ago

Download the zip below, unzip it (inside should be a file called crossref.xsl) and place it somewhere on your local machine such as your desktop.

Should it not be in a folder somewhere, where you won't accidentally delete it?

Melissa37 commented 4 years ago

Now this has been set up, you can correct failures in the blink of an eye but first, you must download the XML.

Suggest remove second comma or add a third one as below

Now this has been set up, you can correct failures in the blink of an eye but, first, you must download the XML.

Melissa37 commented 4 years ago

Oxygen VS oXygen

Need to use a consistent term (I don't mind which, but others might have an opinion!)

Melissa37 commented 4 years ago

configure transformation Scenarios

This is weird, on the screenshot it is Configure transformation Scenarios but in the text we say configure transformation Scenarios, but I prefer configure transformation scenarios!

Suggest we make it consistent with the screenshot in the text, so change to: Configure transformation Scenarios in 2 places it is mentioned

Melissa37 commented 4 years ago

Click 'OK', and then Click 'Save and close'

Suggest changing to:

Click 'OK', and then click 'Save and close'

Melissa37 commented 4 years ago

Downloading the XML from Crossref

Should this heading be

Replacing the XML at Crossref

?

fred-atherden commented 4 years ago

I have created an 'Oxygen' tools page and included the setting up of this transformation scenario in it - https://app.gitbook.com/@elifesciences/s/productionhowto/toolkit/oxygen#how-to-set-up-the-crossref-transformation-scenario - If this is acceptable, we can then simply remove this section from the crossref page and link to this new page from there instead.

JGilbert-eLife commented 4 years ago

Sorry for being so later to this!

DOIs are deposited at the point the article is loaded to Continuum if the publication date is today or in the past, or at the start of the day matching the publication date. Publication itself, PoA or VoR, has nothing to do with it.

JGilbert-eLife commented 4 years ago

The first example screenshot is of a decision/response Crossref submission, not an article submission. There are two types of Crossref email now - those for registering the DOI and reference list for an article, and those for registering the peer review materials.

Hopefully only the article ones will ever fail!

JGilbert-eLife commented 4 years ago

Might be worth covering warnings in the crossref emails as well?

JGilbert-eLife commented 4 years ago

"If the article has been PoA'd we have not sent the citation details to Crossref so Dryad can know the article their dataset is linked to is published."

Not sure I understand this - is it meant to be a check? Sometimes even if we have sent the notification that the article has been accepted, Dryad might not have released the data by the time of VoR. There's a lag in their system, not sure how long.

JGilbert-eLife commented 4 years ago

Otherwise, no comments and I've set up the XSLT - thanks @bcollins14 and @FAtherden-eLife!

elifesciences / schematron-wiki

Create Fixing Crossref failures page #111

Definition of done