Closed gnott closed 4 years ago
I merged in develop
branch after merging PR https://github.com/elifesciences/elife-tools/pull/321 which should hopefully fix why Alfred marked his tests as failing.
Thanks for looking it over @lsh-0, providing comments and approval!
I'm thinking to hold off merging for now because there's a clarification question still outstanding of whether the <related-object>
will be wrapped in a <p>
tag or not. I hoping it will be, because then we don't have to consider it as an entirely new paragraph block every time the parser converts one of these tags in a body element. But, if the XML as specified in the test fixture is decided to be the final XML (where the <related-object>
tag is just inside a <sec>
tag but not wrapped in a <p>
) then this demonstrates we can support that.
Confirmation has been received that the <related-object>
tag in the structured abstracts example will be wrapped with a <p>
tag. The most recent commit here I just added makes it so we do not need to consider <related-object>
tags as block content elements, which I think will be less risky in the future. Considering this small edit, I'll accept the earlier approval of this PR (thanks @lsh-0!) to still apply, and I will merge this PR.
Re issue https://github.com/elifesciences/issues/issues/4622
There was an existing test fixture XML, based on a BMJ Open article, the tests for which show how the older, more basic
abstract()
function omits the section title values and only retains the paragraph content. This is still the case for now.The more recent function
abstract_json()
, which callsrender_abstract_json()
, was created to produce abstract content in an eLife JSON format that validates against the RAML schema.When rendering the abstract's content, using
body_block_content_render()
, which recursively traverses child tags in the XML, instead of using justbody_block_content()
, the output includessection
andparagraph
blocks in the structured format.In the eLife XML example, there is also a
<related-object>
tag, holding the clinical trial information. For now, it is agreed this can be converted to aparagraph
block, and the<related-object>
tag itself, when converted to HTML, can be an<a>
anchor tag.I think I left in parsing the
@id
attribute of the<related-object>
tag, and it gets added to theparagraph
block as an attribute. I believe, in the RAML schema, there is no@id
attribute listed for aparagraph
, but I think if it remains there the RAML schema validation will not care. If we should remove theid
attribute from the output, that option is possible.These code changes do not cause any other existing test cases in this library to fail. If we parse XML which has only
<p>
tags in the<abstract>
tag, then the output should remain the same as it was, and it will continue to be valid against the RAML articlev2
schema.In order to support abstracts that also include
section
blocks, we can introduce this parser change with little risk, and then the adapatations to the RAML schema can continue.I don't know the exact timing of when structured abstract XML for eLife will appear, except to know we need to do all this work in preparation before the first structured abstract can be allowed to pass through the workflows and displayed on journal.