elifesciences / elife-crossref-feed

code to support uploading info to crossref on PAW articles
1 stars 1 forks source link

remove formatting that is converted to <text> #109

Closed Melissa37 closed 7 years ago

Melissa37 commented 7 years ago

Example:

<ref id="bib45">
        <element-citation publication-type="data">
            <person-group person-group-type="author">
                <collab>The <italic>Shigella</italic> Genome Sequencing Consortium</collab>
            </person-group>
            <year iso-8601-date="2015">2015a</year>
            <data-title>Global Diversity of Shigella Species</data-title>
            <source>NCBI BioProject</source>
            <pub-id pub-id-type="accession" assigning-authority="NCBI"
                xlink:href="http://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB2846"
                >PRJEB2846</pub-id>
        </element-citation>
    </ref>

Converts to:

<citation key="45">
                        <volume_title>NCBI BioProject</volume_title>
                        <author>The &lt;italic&gt;Shigella&lt;/italic&gt; Genome Sequencing Consortium</author>
                        <cYear>2015a</cYear>
                    </citation>

Would prefer:

                        <author>The Shigella Genome Sequencing Consortium</author>
Melissa37 commented 7 years ago

@gnott can you find an example where you have sent something with this to Crossref and I'll check what's stored at their end to see whether it is removed on an internal conversion at their end.

gnott commented 7 years ago

There will not be too many examples I think. If we had an <sc> tag in an abstract of a VoR there could be one, but I cannot find one of those.

An example where tags were retained is in the Crossref <author> tag, as you have above. One example is in 10.7554/eLife.19535 we deposited Jan 12, 2017, the batch file has

<citation key="47">
<journal_title>Science</journal_title>
<author>International &lt;italic&gt;Glossina&lt;/italic&gt; Genome Initiative</author>
<volume>344</volume>
<first_page>380</first_page>
<cYear>2014</cYear>
<doi>10.1126/science.1249656</doi>
</citation>
gnott commented 7 years ago

There is an <sc> tag in a component caption on 10.7554/eLife.25051, specifically the 10.7554/eLife.25051.032 component. The caption contains this snippet,

... exerted by <b>S<sub>1</sub></b> and <b>&lt;sc&gt;S<sub>3</sub>&lt;/sc&gt;</b> on <b>S<sub>2</sub></b> is comparable to ...

The &lt; entities I'm seeing are usually escaping a less than < character, so I suspect it will show as the angle bracket in Crossref?

Melissa37 commented 7 years ago

Yeah, heres the JSON from their API:

{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2017,7,25]],"date-time":"2017-07-25T14:35:26Z","timestamp":1500993326746},"reference-count":0,"publisher":"eLife Sciences Organisation, Ltd.","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"DOI":"10.7554\/elife.25051.032","type":"component","created":{"date-parts":[[2017,6,10]],"date-time":"2017-06-10T11:00:15Z","timestamp":1497092415000},"source":"Crossref","is-referenced-by-count":0,"title":["Figure 7\u2014figure supplement 1. A multispecies pairwise model can work under special conditions."],"prefix":"10.7554","member":"4374","container-title":[],"original-title":[],"deposited":{"date-parts":[[2017,6,13]],"date-time":"2017-06-13T06:00:33Z","timestamp":1497333633000},"score":1.0,"subtitle":["(A\u2013B) As a control for Figure 7C, if S3 does not remove the mediator of interaction between S1 and S2, a three-species pairwise model accurately matches the mechanistic model. Simulation parameters are provided in Figure 7\u2014source data 4. (C\u2013D) As a control for Figure 7E, we ensured that fitness effects from multiple species are additive. In this case, a three-species pairwise model can represent the mechanistic model. To ensure the linearity and additivity\u00a0of fitness effects, we have used a larger value of half saturation concentration (KS2C1=103 \u03bcM, instead of 10\u22121 \u03bcM in Figure 7E\u2013F). We have adjusted the interaction coefficients accordingly such that the overall interaction strength exerted by S1 and S3<\/sc> on S2 is comparable to that in Figure 7E\u2013F (as evident by comparable population compositions). Since the interaction influences under these conditions remain in the linear range, the three-species pairwise model accurately predicts the reference dynamics. Simulation parameters are provided in Figure 7\u2014source data 5."],"short-title":[],"issued":{"date-parts":[[null]]},"references-count":0,"URL":"http:\/\/dx.doi.org\/10.7554\/elife.25051.032","relation":{}}}

https://api.crossref.org/v1/works/http://dx.doi.org/10.7554/elife.25051.032

Melissa37 commented 7 years ago

And:

{"key":"47","author":"International <italic>Glossina<\/italic> Genome Initiative","volume":"344","first-page":"380","year":"2014","journal-title":"Science","DOI":"10.1126\/science.1249656","doi-asserted-by":"publisher"},{"key":"48","author":"Khosravi","volume":"15","first-page":"374","year":"2014","journal-title":"Cell Host & Microbe","DOI":"10.1016\/j.chom.2014.02.006","doi-asserted-by":"publisher"},

https://api.crossref.org/v1/works/http://dx.doi.org/10.7554/elife.19535

Melissa37 commented 7 years ago

So if we can remove that would be best!

gnott commented 7 years ago

Looks like the open tag is removed and the close tag is retained in Crossref, interesting.

Going forward

gnott commented 7 years ago

I've added a new test for component caption values to check on face markup tag conversion. It can be modified or expanded later depending on the tags supported.

Do you know about what to do with mathml inside component captions?

gnott commented 7 years ago

I think rather than removing the code that is tagging JATS abstracts, and adding face markup tags, I am going to add two configuration options. They will both be set to false by default, resulting in all tags will get stripped from abstracts, titles and captions. This way the code can be kept but turned off for now.

gnott commented 7 years ago

For reference

NLM / JATS abstracts https://support.crossref.org/hc/en-us/articles/213126186-NLM-JATS-abstracts

Face markup https://support.crossref.org/hc/en-us/articles/214532023-Face-markup