Closed Melissa37 closed 6 years ago
Also, No PoA for these papers, so no need re wrangling from EJP CSV. These papers will have a unique article type that distinguishes them from other research papers. This screenshot is of the above XML in Continuum.
The PubMed output you're describing reminds me of this issue https://github.com/elifesciences/elife-pubmed-feed/issues/67 Structured Abstracts.
However, in the JATS sample above, the parts of the abstract are not separated by <sec>
tags (as I saw on a non-eLife sample leading me to create issue #67).
I did a quick test to generate a PubMed deposit from an abstract that has multiple paragraphs, including the Editors notes:
paragraph you have in the example. Right now, the PubMed generation logic concatenates all the abstract paragraphs into one sentence and puts it into the PubMed <Abstract>
tag.
It may be possible to only change the elife-pubmed-xml-generation
library to process a multi-paragraph abstract differently. That is where the <p>
tags are stripped out (https://github.com/elifesciences/elife-pubmed-xml-generation/blob/develop/elifepubmed/generate.py#L402). We may not need to go back to the article objects or the parser to let the PubMed deposits do this.
HI Graham
Added to this, the "(see decision letter)" will be a link to the decision letter of the paper. I don't think this is allowed in PubMed deposits? Would you be able to strip the linking in the PubMed code?
<p><bold>Editors note: </bold>This article has been through an editorial process
in which the authors decide how to respond to peer review. The Editor's assessment
is that the author responses are thorough and rigorous (see <ext-link ext-link-type="uri"
xlink:href="https://doi.org/10.7554/eLife.34286#decision-letter">decision letter</ext-link>).</p>
or
(see <xref ref-type="decision-letter" rid="SA1">decision letter</xref>)
eLife dev team looking to see whether a bug that prevents this second type of tagging, whihc would be preferable I think.
M
I tested an example having an <ext-link>
in the abstract and the tag is removed as the library runs today.
Roughly in late 2017 the ability to remove tags from the abstract was added to the elife-article
project. Then, in the elife-pubmed-xml-generation
project, the remove_tags
configuration value allows the user to specify which tags to remove from the abstract (https://github.com/elifesciences/elife-pubmed-xml-generation/blob/develop/pubmed.cfg#L11)
Although these features seem to have been finalised around January 2018, the elife-article
library that the bot was using wasn't upgraded until around April 30, 2018 (https://github.com/elifesciences/elife-bot/commit/9c000322cef46d75a72803d1c8ec1b655fb7fc18). It indicates any Pubmed deposit prior to April 30th may include unwanted <ext-link>
tags.
Going forward, there will be no <ext-link>
tags in the PubMed deposit abstracts for eLife articles, due to the configuration file we use.
In regard to the <AbstractText>
and its Label
attribute, I see in the PubMed dtd the Label
is required, however it doesn't indicate the value must be non-blank.
A test file I created is valid according to Pubmed's citation checker, having an abstract like this based on your example:
<Abstract>
<AbstractText Label="">Malaria has been a major driving force ...</AbstractText>
<AbstractText Label="EDITORS NOTE">This article has been through ...</AbstractText>
</Abstract>
The PubMed page would look like this:
Fantastic, thanks G! So, once we get the final specs from editorial w can code this from the eLife JATS XML?
M
@Melissa37 could you please check with PubMed whether this is an ok format for them?
@gnott the text will be"Editorial note:"
email sent to PubMed to check/confirm.
From PubMed: Hi Melissa, Your approach looks fine. Thanks much for checking with us first! Kind regards, Kathi
A side effect of using the <AbstractText>
tag in testing looks like it is a good way to express abstracts that have multiple paragraphs too. Previously this library was joining all the paragraphs into a single string an using that in the <Abstract>
tag.
Setting the Label="Editorial note"
attribute will be a slightly different logic, otherwise it will always be Label=""
.
Cool, Thanks. M
I have a branch with functioning code for the feature. @Melissa37 will you be adding the XML tagging to a kitchen sink as an example?
To avoid parsing <bold>
tags in an abstract inappropriately (I found and eLife article that would produce poor input based on the logic I have right now, https://cdn.elifesciences.org/articles/29365/elife-29365-v2.xml) it is probably safest to add another configuration value for abstract_label_names
. That list for eLife would include Editorial note
and then it can be considered as a label value. By default the list would be empty and it would not convert bold tag values of other publishers into an AbstractText
label attribute.
That sounds good, thanks G. I won't be adding this to the kitchen sink until after the announcement on 25th June if that's OK?
M
Code feature is added in the above PR, and I'm deploying it now to the bot.
Hi Graham
eLife will be conducting an editorial experiment whereby authors choose whether to publish their article based on feedback from the editorial process.
We will be adding a sentence to the end of the article,
This will require parsing to PubMed XML with this end bit fitting their model of: Submit abstract section headings in all uppercase letters followed by a colon and space, for example:
Common section headings are: BACKGROUND, METHODS, RESULTS, and CONCLUSIONS.
Alternatively, publishers may use the element and its Label attribute to supply structured abstract section headings. The section heading should still be submitted in all uppercase letters.
I have not created a sample for PubMed yet, and need to discuss with them how the rest of the abstract would be tagged (ie could it not have a label?).
WDYT?