Closed Melissa37 closed 4 years ago
@gnott This is the clinical trial ticket
Note to self: I had some comments about clinical trials in comment https://github.com/elifesciences/elife-crossref-feed/issues/145#issuecomment-623795752, about adding data to the Article()
object and to adding the Crossref clinical trials DTD to the generated deposit XML.
From https://www.crossref.org/education/crossmark/linked-clinical-trials/,
- The relationship of the publication to the clinical trial (optional) This field is optional but encouraged. The three allowed elements are “pre-results”, “results” and “post-results”, indicating which stage of the trial the publication is reporting on.
@Melissa37, would we ever have this value in the XML, or anticipate it would be known whether a clinical trial had a status of these types?
I would also like to think the Crossref deposit library should consider making this an option to specify, even if eLife is not using these values.
I'm reading JATS4R recommendation, the @content-type
attribute of the <related-object>
tag looks to hold this data, so when I configure parsing article XML and populating the clinical trials data of an Article
, I will add a sample with that level of detail.
I have a valid manually composed deposit, the XML contains this:
<custom_metadata>
...
<ct:program>
<ct:clinical-trial-number registry="10.18810/clinical-trials-gov">NCT02836002</ct:clinical-trial-number>
</ct:program>
</custom_metadata>
I see now there will be a little more scope than I expected, because we need to match up the registry name from the article XML with the list of registries Crossref maintains at http://api.crossref.org/works/10.18810/registries/transform/application/vnd.crossref.unixsd+xml in order to get the DOI of the registry.
To do the matching, I think I'll add in some logic into the Crossref library to parse the registry XML file, use an example file for testing purposes, and when the Crossref library is incorporated into a workflow, we can download a fresh copy of the registry XML prior to populating the clinical trial data for the article, if the article has any clinical trials. I want to avoid saving a copy of the registry XML as it is today into the project, because it will eventually be out-of-date, and we should always rely on the live registry file when generating real Crossref deposits.
Making a note too that if I changed in my sample registry="10.18810/clinical-trials-gov"
to registry="10.18810/foo"
, it is not rejected immediately by the Crossref XML validity checker. I don't know what the Crossref ingestion queue would do if the DOI doesn't match the registry they maintain. We'll assume for now that only the registry names we can match to Crossref's registry are the ones we will include in the Crossref deposit.
The relationship of the publication to the clinical trial (optional) This field is optional but encouraged. The three allowed elements are “pre-results”, “results” and “post-results”, indicating which stage of the trial the publication is reporting on. @Melissa37, would we ever have this value in the XML, or anticipate it would be known whether a clinical trial had a status of these types? I would also like to think the Crossref deposit library should consider making this an option to specify, even if eLife is not using these values.
I remember when this was all discussed on the Crossref working group implementing this - it was all medical journals
We've only just started looking into Medicine and the starting point was getting abstracts to match what other medical journals are doing.
@mariajoaoguerreiro might have a view on whether we'll be recording this in the future but for now it's not something eLife can do.
I'm reading JATS4R recommendation, the
@content-type
attribute of the<related-object>
tag looks to hold this data, so when I configure parsing article XML and populating the clinical trials data of an Article, I will add a sample with that level of detail.
Cool, makes sense to future proof for eLife but make it work for those already doing this
I see now there will be a little more scope than I expected, because we need to match up the registry name from the article XML with the list of registries Crossref maintains at http://api.crossref.org/works/10.18810/registries/transform/application/vnd.crossref.unixsd+xml in order to get the DOI of the registry. To do the matching, I think I'll add in some logic into the Crossref library to parse the registry XML file, use an example file for testing purposes, and when the Crossref library is incorporated into a workflow, we can download a fresh copy of the registry XML prior to populating the clinical trial data for the article, if the article has any clinical trials. I want to avoid saving a copy of the registry XML as it is today into the project, because it will eventually be out-of-date, and we should always rely on the live registry file when generating real Crossref deposits.
Ah, good point, I had forgotten about that. @FAtherden-eLife could you correspond with @gnott on this so we get some Schematron validation in place too?
Making a note too that if I changed in my sample registry="10.18810/clinical-trials-gov" to registry="10.18810/foo", it is not rejected immediately by the Crossref XML validity checker. I don't know what the Crossref ingestion queue would do if the DOI doesn't match the registry they maintain. We'll assume for now that only the registry names we can match to Crossref's registry are the ones we will include in the Crossref deposit.
Yeah, makes sense, but what if they update that list? Should I check where they are notifying people of new releases? For instance the Open Funder Registry gets new irregular releases that we update in our systems.
@Melissa37 Yes, I'd agree with you.
... new releases?
The registry XML has this value <crm-item name="last-update" type="date">2020-04-07T11:31:23Z</crm-item>
which might be helpful to detect new versions, but as for how or whether Crossref notifies people about a new release I could not say.
A question perhaps for @FAtherden-eLife, a question I have is: if you look at the registry XML file, for the one eLife example I have which uses ClinicalTrials.gov
as the registry name, that value is used as both the <title>
and <subtitle>
for that registry.
If you were to add a clinical trial for one of the other registries, would you be using the <title>
or <subtitle>
in the article XML (which is what I'd use to match and find the DOI for that registry)?
For example, in the <related-object>
tag, would you have source-id="EU Clinical Trials Register"
or source-id="EU-CTR"
for that registry?
@gnott, my position would be that we should be using the subtitle for the source-id
attribute value, so source-id="EU-CTR"
would be correct/expected.
We can control the list of allowed source-id values based on that XML file, via Schematron, so that no others should come through from production.
I got to a point yesterday where I was a little stuck on processing the @content-type
attribute, because JATS4R may recommend a value like pre-results
but Crossref schema accepts the value preResults
. I've just realised, whichever is chosen for the article XML, potentially validated by Schematron, it won't matter to me as long as I make sure the value translation supports both values: if preResults
, use preResults
, if pre-results
use preResults
in the Crossref deposit.
I got to a point yesterday where I was a little stuck on processing the @content-type attribute, because JATS4R may recommend a value like pre-results but Crossref schema accepts the value preResults. I've just realised, whichever is chosen for the article XML, potentially validated by Schematron, it won't matter to me as long as I make sure the value translation supports both values: if preResults, use preResults, if pre-results use preResults in the Crossref deposit.
Yeah, JATS4R has attribute guidance and it differs from how Crossref works, so some mapping would have to happen.
This is for the benefit of all publishers using our tool though, right? As we don't have this level of detail!
Yes, the @content-type
attribute I want to add to a test scenario sample just so it is covered and it is simple to add, even if not used (yet) in eLife XML.
New issue https://github.com/elifesciences/issues/issues/5830 to be a reminder to test this out or check the results when clinical trials data is available for eLife articles.
Problem / Motivation
Production wants to send clinical trial data to Crossref so this information is available to users of their API and we are helping with the need to track clinical trial
Proposed solution
eLife add CrossMark metadata to our deposits:
Linked Clinical trials Dependency - updated structured abstracts: https://github.com/elifesciences/issues/issues/4622
Crossref documentation: https://www.crossref.org/education/crossmark/linked-clinical-trials/ These fields should be included within the custom metadata section of the Crossmark deposit
Generally our articles seem to only link to one clinical trial, but multiple can be added. I assume the DOI listed is the DOI of this paper.
From Crossref re the need for a DOI
Clarification needed and assumptions
This does not seem to support our use case and we'd be better off just re-depositing everything?
Question: 'true '. What does this refer to? Is eLife True or False?
Tasks
clinicaltrials.xsd
DTD is being added as an XML namespace - Yes, we already do that nowArticle
object< custom_metadata>
tag (I believe) Answer: Correct.source-id
todoi
maphttp://doi.org/10.18810/registries
as the URI of the registries XML, is probably the safestcrossref-doi
type, instead ofregistry-name
typepre-results
getting converted topreResults
prod
environmentTechnical notes
Here are some of my notes and thoughts, for discussion:
I think structured abstracts could possibly be added to the article data structure used by the Crossref generation library without involving integration with other data schemas
Clinical trial data would be added as a new property of an Article object, and then we can include that in Crossref deposits
Crossmark related:
Code in the old, archived, Crossref generation library: For defining the Crossmark policy and domain (https://github.com/elifesciences/elife-poa-xml-generation/blob/develop/generateCrossrefXml.py#L34-L35) Old code that added Crossmark XML to a Crossref deposit, but it was never used for real I think https://github.com/elifesciences/elife-poa-xml-generation/blob/develop/generateCrossrefXml.py#L219-L236 Perhaps not all articles would need to be deposited with Crossmark data, but my guess is if we want to register a Correction, for example, the article that is being corrected would need to be deposited with Crossmark, and then the correction article as well afterward XML and testing
For clinial trials support, need to add XML schema prefix to the Crossref XML deposit, e.g. xmlns:ct="http://www.crossref.org/clinicaltrials.xsd" Add additional settings for Crossmark into the elifecrossref library .cfg file to turn on/off Crossmark deposits, specify the Crossmark domain and Crossmark policy DOI For Crossmark, test exam
User interface / Wireframes