elifesciences / elife-crossref-feed

code to support uploading info to crossref on PAW articles
1 stars 1 forks source link

Sending Clinical trial metadata to Crossref #146

Closed Melissa37 closed 4 years ago

Melissa37 commented 4 years ago

Problem / Motivation

Production wants to send clinical trial data to Crossref so this information is available to users of their API and we are helping with the need to track clinical trial

Proposed solution

eLife add CrossMark metadata to our deposits:

Linked Clinical trials Dependency - updated structured abstracts: https://github.com/elifesciences/issues/issues/4622

Crossref documentation: https://www.crossref.org/education/crossmark/linked-clinical-trials/ These fields should be included within the custom metadata section of the Crossmark deposit

<clinicaltrial_data>
<doi>10.5555/12345678</doi>
<ct:program>
<ct:clinical-trial-number registry="10.18810/isrctn" type="results">ISRCTN1234</ct:clinical-trial-number>
<ct:clinical-trial-number registry="10.18810/isrctn" type="results">ISRCTN9999</ct:clinical-trial-number>
</ct:program>
</clinicaltrial_data>

Generally our articles seem to only link to one clinical trial, but multiple can be added. I assume the DOI listed is the DOI of this paper.

From Crossref re the need for a DOI

As we advise for users supplying Crossmark data simply as a workaround to make their free-to-read content visible in our API, you could just use the DOI that's being updated in the tags as well as in the . The later is more of a 'hack' than the former, but since we're in the midst of changing approaches to this kind of update metadata, it's still a fine option. If you want to register a DOI for your updates policies that's great, but we don't want that extra step to discourage anyone from sending in the updates metadata.

<crossmark>
<crossmark_policy>10.5555/crossmark_policy</crossmark_policy>
<crossmark_domains>
<crossmark_domain>
<domain>psychoceramics.labs.crossref.org</domain>
</crossmark_domain>
</crossmark_domains>
<crossmark_domain_exclusive>true</crossmark_domain_exclusive>

Clarification needed and assumptions

Deposit Crossmark metadata - as part of your regular Crossref metadata deposit, and can also be deposited as stand-alone data to populate backfiles. For Crossmark-only deposits, see the schema and schema documentation relating to resource-only deposits. https://www.crossref.org/education/metadata-stewardship/maintaining-your-metadata/adding-metadata-to-an-existing-record/

This does not seem to support our use case and we'd be better off just re-depositing everything?

Question: 'true'. What does this refer to? Is eLife True or False?


Tasks

Technical notes

Here are some of my notes and thoughts, for discussion:

I think structured abstracts could possibly be added to the article data structure used by the Crossref generation library without involving integration with other data schemas

Clinical trial data would be added as a new property of an Article object, and then we can include that in Crossref deposits

Crossmark related:

Code in the old, archived, Crossref generation library: For defining the Crossmark policy and domain (https://github.com/elifesciences/elife-poa-xml-generation/blob/develop/generateCrossrefXml.py#L34-L35) Old code that added Crossmark XML to a Crossref deposit, but it was never used for real I think https://github.com/elifesciences/elife-poa-xml-generation/blob/develop/generateCrossrefXml.py#L219-L236 Perhaps not all articles would need to be deposited with Crossmark data, but my guess is if we want to register a Correction, for example, the article that is being corrected would need to be deposited with Crossmark, and then the correction article as well afterward XML and testing

For clinial trials support, need to add XML schema prefix to the Crossref XML deposit, e.g. xmlns:ct="http://www.crossref.org/clinicaltrials.xsd" Add additional settings for Crossmark into the elifecrossref library .cfg file to turn on/off Crossmark deposits, specify the Crossmark domain and Crossmark policy DOI For Crossmark, test exam

User interface / Wireframes

Melissa37 commented 4 years ago

@gnott This is the clinical trial ticket

gnott commented 4 years ago

Note to self: I had some comments about clinical trials in comment https://github.com/elifesciences/elife-crossref-feed/issues/145#issuecomment-623795752, about adding data to the Article() object and to adding the Crossref clinical trials DTD to the generated deposit XML.

gnott commented 4 years ago

From https://www.crossref.org/education/crossmark/linked-clinical-trials/,

  1. The relationship of the publication to the clinical trial (optional) This field is optional but encouraged. The three allowed elements are “pre-results”, “results” and “post-results”, indicating which stage of the trial the publication is reporting on.

@Melissa37, would we ever have this value in the XML, or anticipate it would be known whether a clinical trial had a status of these types?

I would also like to think the Crossref deposit library should consider making this an option to specify, even if eLife is not using these values.

gnott commented 4 years ago

I'm reading JATS4R recommendation, the @content-type attribute of the <related-object> tag looks to hold this data, so when I configure parsing article XML and populating the clinical trials data of an Article, I will add a sample with that level of detail.

gnott commented 4 years ago

I have a valid manually composed deposit, the XML contains this:

<custom_metadata>
...
  <ct:program>
    <ct:clinical-trial-number registry="10.18810/clinical-trials-gov">NCT02836002</ct:clinical-trial-number>
 </ct:program>
</custom_metadata>

I see now there will be a little more scope than I expected, because we need to match up the registry name from the article XML with the list of registries Crossref maintains at http://api.crossref.org/works/10.18810/registries/transform/application/vnd.crossref.unixsd+xml in order to get the DOI of the registry.

To do the matching, I think I'll add in some logic into the Crossref library to parse the registry XML file, use an example file for testing purposes, and when the Crossref library is incorporated into a workflow, we can download a fresh copy of the registry XML prior to populating the clinical trial data for the article, if the article has any clinical trials. I want to avoid saving a copy of the registry XML as it is today into the project, because it will eventually be out-of-date, and we should always rely on the live registry file when generating real Crossref deposits.

gnott commented 4 years ago

Making a note too that if I changed in my sample registry="10.18810/clinical-trials-gov" to registry="10.18810/foo", it is not rejected immediately by the Crossref XML validity checker. I don't know what the Crossref ingestion queue would do if the DOI doesn't match the registry they maintain. We'll assume for now that only the registry names we can match to Crossref's registry are the ones we will include in the Crossref deposit.

Melissa37 commented 4 years ago

The relationship of the publication to the clinical trial (optional) This field is optional but encouraged. The three allowed elements are “pre-results”, “results” and “post-results”, indicating which stage of the trial the publication is reporting on. @Melissa37, would we ever have this value in the XML, or anticipate it would be known whether a clinical trial had a status of these types? I would also like to think the Crossref deposit library should consider making this an option to specify, even if eLife is not using these values.

I remember when this was all discussed on the Crossref working group implementing this - it was all medical journals

We've only just started looking into Medicine and the starting point was getting abstracts to match what other medical journals are doing.

@mariajoaoguerreiro might have a view on whether we'll be recording this in the future but for now it's not something eLife can do.

I'm reading JATS4R recommendation, the @content-type attribute of the <related-object> tag looks to hold this data, so when I configure parsing article XML and populating the clinical trials data of an Article, I will add a sample with that level of detail.

Cool, makes sense to future proof for eLife but make it work for those already doing this

I see now there will be a little more scope than I expected, because we need to match up the registry name from the article XML with the list of registries Crossref maintains at http://api.crossref.org/works/10.18810/registries/transform/application/vnd.crossref.unixsd+xml in order to get the DOI of the registry. To do the matching, I think I'll add in some logic into the Crossref library to parse the registry XML file, use an example file for testing purposes, and when the Crossref library is incorporated into a workflow, we can download a fresh copy of the registry XML prior to populating the clinical trial data for the article, if the article has any clinical trials. I want to avoid saving a copy of the registry XML as it is today into the project, because it will eventually be out-of-date, and we should always rely on the live registry file when generating real Crossref deposits.

Ah, good point, I had forgotten about that. @FAtherden-eLife could you correspond with @gnott on this so we get some Schematron validation in place too?

Making a note too that if I changed in my sample registry="10.18810/clinical-trials-gov" to registry="10.18810/foo", it is not rejected immediately by the Crossref XML validity checker. I don't know what the Crossref ingestion queue would do if the DOI doesn't match the registry they maintain. We'll assume for now that only the registry names we can match to Crossref's registry are the ones we will include in the Crossref deposit.

Yeah, makes sense, but what if they update that list? Should I check where they are notifying people of new releases? For instance the Open Funder Registry gets new irregular releases that we update in our systems.

mariajoaoguerreiro commented 4 years ago

@Melissa37 Yes, I'd agree with you.

gnott commented 4 years ago

... new releases?

The registry XML has this value <crm-item name="last-update" type="date">2020-04-07T11:31:23Z</crm-item> which might be helpful to detect new versions, but as for how or whether Crossref notifies people about a new release I could not say.

gnott commented 4 years ago

A question perhaps for @FAtherden-eLife, a question I have is: if you look at the registry XML file, for the one eLife example I have which uses ClinicalTrials.gov as the registry name, that value is used as both the <title> and <subtitle> for that registry.

If you were to add a clinical trial for one of the other registries, would you be using the <title> or <subtitle> in the article XML (which is what I'd use to match and find the DOI for that registry)?

For example, in the <related-object> tag, would you have source-id="EU Clinical Trials Register" or source-id="EU-CTR" for that registry?

fred-atherden commented 4 years ago

@gnott, my position would be that we should be using the subtitle for the source-id attribute value, so source-id="EU-CTR" would be correct/expected.

We can control the list of allowed source-id values based on that XML file, via Schematron, so that no others should come through from production.

gnott commented 4 years ago

I got to a point yesterday where I was a little stuck on processing the @content-type attribute, because JATS4R may recommend a value like pre-results but Crossref schema accepts the value preResults. I've just realised, whichever is chosen for the article XML, potentially validated by Schematron, it won't matter to me as long as I make sure the value translation supports both values: if preResults, use preResults, if pre-results use preResults in the Crossref deposit.

Melissa37 commented 4 years ago

I got to a point yesterday where I was a little stuck on processing the @content-type attribute, because JATS4R may recommend a value like pre-results but Crossref schema accepts the value preResults. I've just realised, whichever is chosen for the article XML, potentially validated by Schematron, it won't matter to me as long as I make sure the value translation supports both values: if preResults, use preResults, if pre-results use preResults in the Crossref deposit.

Yeah, JATS4R has attribute guidance and it differs from how Crossref works, so some mapping would have to happen.

This is for the benefit of all publishers using our tool though, right? As we don't have this level of detail!

gnott commented 4 years ago

Yes, the @content-type attribute I want to add to a test scenario sample just so it is covered and it is simple to add, even if not used (yet) in eLife XML.

gnott commented 4 years ago

New issue https://github.com/elifesciences/issues/issues/5830 to be a reminder to test this out or check the results when clinical trials data is available for eLife articles.