elifesciences / elife-pubmed-feed

code to support uploading feeds to pubmed for POA articles and VOR articles
1 stars 4 forks source link

Converting MathML #64

Closed Melissa37 closed 6 years ago

Melissa37 commented 6 years ago

Convert MathML tags to text "[Formula: see text]"

EG

<p>This is the abstract. This article will describe the eLife article and the process.
                    An abstract can contain any formatting, such as <italic>italics</italic>,
                        <bold>bold</bold>, <sup>superscript</sup>, <sub>subscript</sub> or <sc>small caps</sc>. MathML is
                    also allowed: 
                    <inline-formula><mml:math>
                        <mml:mrow>
                            <mml:munder>
                                <mml:mo/>
                                <mml:mi>m</mml:mi>
                            </mml:munder>
                            <mml:mrow>
                                <mml:msub>
                                    <mml:mover accent="true">
                                        <mml:mi>p</mml:mi>
                                        <mml:mo/>
                                    </mml:mover>
                                    <mml:mi>m</mml:mi>
                                </mml:msub>
                                <mml:mo>=</mml:mo>
                                <mml:mn>0</mml:mn>
                            </mml:mrow>
                        </mml:mrow>
                    </mml:math>
                        </inline-formula>
                .</p>
                <p>eLife does not structure abstracts into sub headings expect in a clinical trial article, but the abstract can have
                    multiple paragarahs. The sub DOI is always .001 as it is the first asset in any
                    article. I have added an unmatched > bracket as this has been an issue for PubMed deposits in the past.</p>

Should be converted to:

xml
<Abstract>This is the abstract. This article will describe the eLife article and the
            process. An abstract can contain any formatting, such as <i>italics</i>, <b>bold</b>,
                <sup>superscript</sup>, <sub>subscript</sub> or small caps. MathML is also allowed:
           [Formula: see text].eLife does not structure abstracts into sub headings expect in a clinical
            trial article, but the abstract can have multiple paragarahs. The sub DOI is always .001
            as it is the first asset in any article. I have added an unmatched &gt; bracket as this
            has been an issue for PubMed deposits in the past.