elifesciences / elife-pubmed-feed

code to support uploading feeds to pubmed for POA articles and VOR articles
1 stars 4 forks source link

PubMed DTD now supports MathML 3.0 tagging #75

Closed Melissa37 closed 3 years ago

Melissa37 commented 5 years ago

Dear PubMed Data Provider,

We've recently updated our DTD to support MathML 3.0 tagging for complex formulas in PubMed citations. This means that the formulas that were previously replaced with the phrase [Formula: see text] can now be expressed using MathML 3.0.

Please use the following header for all XML files, particularly if you are including MathML 3.0: <!DOCTYPE ArticleSet PUBLIC "-//NLM//DTD PubMed 2.7//EN" "https://dtd.nlm.nih.gov/ncbi/pubmed/in/PubMed.dtd">.

If your journal does not use MathML tagging, no changes are necessary. You can continue to submit simple formulas (for example, greater than and less than symbols) with Unicode character encoding. Please refer to the PubMed Special Character Set for preferred encoding for common special characters. XML submissions may also include <sup>, <sub>, <inf>, <b>, <i>, and <u>.

The phrase [Formula: see text] is also still accepted in place of complex formulas.

RESOURCES:

If you have any questions, please contact us at publisher@ncbi.nlm.nih.gov.

Melissa37 commented 3 years ago

We use MathML in abstracts but I don't know how much work this would be for you to do. In perhaps 12-18 months we plan to allow TexMath/LaTex to pass through our XML and not convert to MathML, which makes me think this could be a waste of time and it would be better to focus effort on how to convert LaTex to plain text for PubMed submissions.

WDTY @FAtherden-eLife ?

fred-atherden commented 3 years ago

I agree.

Looking through (the few) abstracts that we've had in the past with equations (25), the vast majority are needlessly marked up using MathML, and could be alternatively represented using simple unicode.

I think that we should be avoiding using equations in abstracts where possible, so I will implement a warning for this in the Schematron. Obviously that will only account for VoRs, but I assume that equations can't be included at PoA anyway.

In some cases we might have to still include Maths (in cases where it can't be represented in normal text/unicode, or perhaps in (sort-of vanity) cases where authors want the fonts in their PDF to be consistent throughout), but those should be a rarity and, in my opinion, therefore wouldn't justify expending any time supporting them here.

fred-atherden commented 3 years ago

Note to self - confirm what we are delivering to PubMed if there is MathML in an abstract.

gnott commented 3 years ago

I think when the PubMed generation library encounters a formula in an abstract it is replaced with [Formula: see text]. The line from an eLife 00666 kitchen sink article in the test scenarios, https://github.com/elifesciences/elife-pubmed-xml-generation/blob/develop/tests/test_data/elife-pubmed-00666-20170717071707.xml#L126

fred-atherden commented 3 years ago

Thanks G! Yup - here's an article from 2020 - https://pubmed.ncbi.nlm.nih.gov/32717179/.

I'm going to close this given our discussion earlier.