elifesciences / elife-pubmed-feed

code to support uploading feeds to pubmed for POA articles and VOR articles
1 stars 4 forks source link

eLife peer review experiment #73

Closed Melissa37 closed 6 years ago

Melissa37 commented 6 years ago

Hi Graham

eLife will be conducting an editorial experiment whereby authors choose whether to publish their article based on feedback from the editorial process.

We will be adding a sentence to the end of the article,

<abstract>
                <object-id pub-id-type="doi">10.7554/eLife.31579.001</object-id>
                <p>Malaria has been a major driving force in the evolution of the human genome. In
                    sub-Saharan African populations, two neighbouring polymorphisms in the
                    Complement Receptor One (<italic>CR1</italic>) gene, named <italic>Sl2</italic>
                    and <italic>McC<sup>b</sup></italic>, occur at high frequencies, consistent with
                    selection by malaria. Previous studies have been inconclusive. Using a large
                    case-control study of severe malaria in Kenyan children and statistical models
                    adjusted for confounders, we estimate the relationship between
                        <italic>Sl2</italic> and <italic>McC<sup>b</sup></italic> and malaria
                    phenotypes, and find they have opposing associations. The <italic>Sl2</italic>
                    polymorphism is associated with markedly reduced odds of cerebral malaria and
                    death, while the <italic>McC<sup>b</sup></italic> polymorphism is associated
                    with increased odds of cerebral malaria. We also identify an apparent
                    interaction between <italic>Sl2</italic> and &#x03B1;<sup>+</sup>thalassaemia,
                    with the protective association of <italic>Sl2</italic> greatest in children
                    with normal &#x03B1;-globin. The complex relationship between these three
                    mutations may explain previous conflicting findings, highlighting the importance
                    of considering genetic interactions in disease-association studies.</p>
                <p><bold>Editors note: </bold>This article has been through an editorial process 
                    in which the authors decide how to respond to peer review. The Editor's assessment 
                    is that the author responses are thorough and rigorous (see decision letter).</p>
            </abstract>

This will require parsing to PubMed XML with this end bit fitting their model of: Submit abstract section headings in all uppercase letters followed by a colon and space, for example:

<Abstract>

BACKGROUND: Approximately 3,000 new cases of oral cancer...

</Abstract>

Common section headings are: BACKGROUND, METHODS, RESULTS, and CONCLUSIONS.

Alternatively, publishers may use the element and its Label attribute to supply structured abstract section headings. The section heading should still be submitted in all uppercase letters.

<Abstract>

<AbstractText Label="OBJECTIVE">To assess the effects...</AbstractText>

<AbstractText Label="METHODS">Patients attending lung...</AbstractText>

<AbstractText Label="RESULTS">Twenty-five patients...</AbstractText>

<AbstractText Label="CONCLUSIONS">The findings suggest...</AbstractText>

</Abstract>

I have not created a sample for PubMed yet, and need to discuss with them how the rest of the abstract would be tagged (ie could it not have a label?).

WDYT?

Melissa37 commented 6 years ago

Also, No PoA for these papers, so no need re wrangling from EJP CSV. These papers will have a unique article type that distinguishes them from other research papers. screen shot 2018-05-23 at 16 52 52 This screenshot is of the above XML in Continuum.

gnott commented 6 years ago

The PubMed output you're describing reminds me of this issue https://github.com/elifesciences/elife-pubmed-feed/issues/67 Structured Abstracts.

However, in the JATS sample above, the parts of the abstract are not separated by <sec> tags (as I saw on a non-eLife sample leading me to create issue #67).

I did a quick test to generate a PubMed deposit from an abstract that has multiple paragraphs, including the Editors notes: paragraph you have in the example. Right now, the PubMed generation logic concatenates all the abstract paragraphs into one sentence and puts it into the PubMed <Abstract> tag.

It may be possible to only change the elife-pubmed-xml-generation library to process a multi-paragraph abstract differently. That is where the <p> tags are stripped out (https://github.com/elifesciences/elife-pubmed-xml-generation/blob/develop/elifepubmed/generate.py#L402). We may not need to go back to the article objects or the parser to let the PubMed deposits do this.

Melissa37 commented 6 years ago

HI Graham

Added to this, the "(see decision letter)" will be a link to the decision letter of the paper. I don't think this is allowed in PubMed deposits? Would you be able to strip the linking in the PubMed code?

Melissa37 commented 6 years ago
<p><bold>Editors note: </bold>This article has been through an editorial process 
 in which the authors decide how to respond to peer review. The Editor's assessment 
is that the author responses are thorough and rigorous (see <ext-link ext-link-type="uri"
xlink:href="https://doi.org/10.7554/eLife.34286#decision-letter">decision letter</ext-link>).</p>

or

(see <xref ref-type="decision-letter" rid="SA1">decision letter</xref>)

eLife dev team looking to see whether a bug that prevents this second type of tagging, whihc would be preferable I think.

M

Melissa37 commented 6 years ago

See: https://www.ncbi.nlm.nih.gov/pubmed/?term=Sci-Hub+provides+access+to+nearly+all+scholarly+literature

gnott commented 6 years ago

I tested an example having an <ext-link> in the abstract and the tag is removed as the library runs today.

Roughly in late 2017 the ability to remove tags from the abstract was added to the elife-article project. Then, in the elife-pubmed-xml-generation project, the remove_tags configuration value allows the user to specify which tags to remove from the abstract (https://github.com/elifesciences/elife-pubmed-xml-generation/blob/develop/pubmed.cfg#L11)

Although these features seem to have been finalised around January 2018, the elife-article library that the bot was using wasn't upgraded until around April 30, 2018 (https://github.com/elifesciences/elife-bot/commit/9c000322cef46d75a72803d1c8ec1b655fb7fc18). It indicates any Pubmed deposit prior to April 30th may include unwanted <ext-link> tags.

Going forward, there will be no <ext-link> tags in the PubMed deposit abstracts for eLife articles, due to the configuration file we use.

gnott commented 6 years ago

In regard to the <AbstractText> and its Label attribute, I see in the PubMed dtd the Label is required, however it doesn't indicate the value must be non-blank.

A test file I created is valid according to Pubmed's citation checker, having an abstract like this based on your example:

<Abstract>
<AbstractText Label="">Malaria has been a major driving force ...</AbstractText>
<AbstractText Label="EDITORS NOTE">This article has been through ...</AbstractText>
</Abstract>

The PubMed page would look like this:

image

Melissa37 commented 6 years ago

Fantastic, thanks G! So, once we get the final specs from editorial w can code this from the eLife JATS XML?

M

gnott commented 6 years ago

@Melissa37 could you please check with PubMed whether this is an ok format for them?

Melissa37 commented 6 years ago

@gnott the text will be"Editorial note:"

Melissa37 commented 6 years ago

email sent to PubMed to check/confirm.

Melissa37 commented 6 years ago

From PubMed: Hi Melissa, Your approach looks fine. Thanks much for checking with us first! Kind regards, Kathi

gnott commented 6 years ago

A side effect of using the <AbstractText> tag in testing looks like it is a good way to express abstracts that have multiple paragraphs too. Previously this library was joining all the paragraphs into a single string an using that in the <Abstract> tag.

Setting the Label="Editorial note" attribute will be a slightly different logic, otherwise it will always be Label="".

Melissa37 commented 6 years ago

Cool, Thanks. M

gnott commented 6 years ago

I have a branch with functioning code for the feature. @Melissa37 will you be adding the XML tagging to a kitchen sink as an example?

gnott commented 6 years ago

To avoid parsing <bold> tags in an abstract inappropriately (I found and eLife article that would produce poor input based on the logic I have right now, https://cdn.elifesciences.org/articles/29365/elife-29365-v2.xml) it is probably safest to add another configuration value for abstract_label_names. That list for eLife would include Editorial note and then it can be considered as a label value. By default the list would be empty and it would not convert bold tag values of other publishers into an AbstractText label attribute.

Melissa37 commented 6 years ago

That sounds good, thanks G. I won't be adding this to the kitchen sink until after the announcement on 25th June if that's OK?

M

gnott commented 6 years ago

Code feature is added in the above PR, and I'm deploying it now to the bot.