brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
881 stars 93 forks source link

Bibliography: Problems with A&A adsurl #710

Closed goska closed 8 years ago

goska commented 8 years ago

I am experiencing problems with conversion of documents and bibliographies containing adsurls of Astronomy and Astrophysics articles, which contain strings %26 (url encoded & in A&A). LaTeXML generates malformed bibentry elements for such bib records: closing bib entry tag and and any bib-data following bib-data with adsurl role are missing. Using LaTeXML 0.8.1 running under MacOS X 10 with TeXLive, aa style bibliography and standard LaTeX report document class. For example:

@ARTICLE{2009A&A...505..385A,
   author = {{Andrei}, A.~H. and {Souchay}, J. and {Zacharias}, N. and {Smart}, R.~L. and 
    {Vieira Martins}, R. and {da Silva Neto}, D.~N. and {Camargo}, J.~I.~B. and 
    {Assafin}, M. and {Barache}, C. and {Bouquillon}, S. and {Penna}, J.~L. and 
    {Taris}, F.},
    title = "{The large quasar reference frame (LQRF). An optical representation of the ICRS}",
  journal = {\aap},
archivePrefix = "arXiv",
   eprint = {0907.2403},
 keywords = {catalogs, reference systems, quasars: general, methods: data analysis, astrometry},
     year = 2009,
    month = oct,
   volume = 505,
    pages = {385-404},
      doi = {10.1051/0004-6361/200912041},
   adsurl = {http://adsabs.harvard.edu/abs/2009A%26A...505..385A},
  adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}
@ARTICLE{2008AcA....58...89U,

yields

        <bibentry key="2009A&amp;A...505..385A" type="article" xml:id="bib.bib12">
          <bib-name role="author">
            <surname>Andrei</surname>
            <givenname>A. H.</givenname>
          </bib-name>
          <bib-name role="author">
            <surname>Souchay</surname>
            <givenname>J.</givenname>
          </bib-name>
          <bib-name role="author">
            <surname>Zacharias</surname>
            <givenname>N.</givenname>
          </bib-name>
          <bib-name role="author">
            <surname>Smart</surname>
            <givenname>R. L.</givenname>
          </bib-name>
          <bib-name role="author">
            <surname>Vieira Martins</surname>
            <givenname>R.</givenname>
          </bib-name>
          <bib-name role="author">
            <surname>da Silva Neto</surname>
            <givenname>D. N.</givenname>
          </bib-name>
          <bib-name role="author">
            <surname>Camargo</surname>
            <givenname>J. I. B.</givenname>
          </bib-name>
          <bib-name role="author">
            <surname>Assafin</surname>
            <givenname>M.</givenname>
          </bib-name>
          <bib-name role="author">
            <surname>Barache</surname>
            <givenname>C.</givenname>
          </bib-name>
          <bib-name role="author">
            <surname>Bouquillon</surname>
            <givenname>S.</givenname>
          </bib-name>
          <bib-name role="author">
            <surname>Penna</surname>
            <givenname>J. L.</givenname>
          </bib-name>
          <bib-name role="author">
            <surname>Taris</surname>
            <givenname>F.</givenname>
          </bib-name>
          <bib-title>The large quasar reference frame (LQRF). An optical representation of the ICRS</bib-title>
          <bib-related role="host" type="journal">
            <bib-title>A&amp;A</bib-title>
          </bib-related>
          <bib-data role="archiveprefix">arXiv</bib-data>
          <bib-links>0907.2403</bib-links>
          <bib-extract role="keywords">catalogs, reference systems, quasars: general, methods: data analysis, astrometry</bib-extract>
          <bib-date role="publication">2009-10</bib-date>
          <bib-data role="month">October</bib-data>
          <bib-part role="volume">505</bib-part>
          <bib-part role="pages">385–404</bib-part>
          <bib-identifier href="http://dx.doi.org/10.1051/0004-6361/200912041" id="10.1051/0004-6361/200912041" scheme="doi">Document</bib-identifier>
          <bib-data role="adsurl">http://adsabs.harvard.edu/abs/2009A%26A…505..385A</bib-data>
          <bibentry key="2008AcA....58...89U" type="article" xml:id="bib.bib13">

Thanks

brucemiller commented 8 years ago

Yeah, this is tricky because LaTeXML's processing order isn't quite the same as with LaTeX+BibTeX, namely LaTeXML converts the whole thing to XML for later use and doesn't know how or if all the fields will be used. For some fields a % gets treated as a comment and for others it doesn't.

In principle, a simple extension would get adsurl treated correctly. You'd create binding containing (something like) the following:

use LaTeXML::Package;
DefConstructor('\bib@field@default@adsurl Semiverbatim',
  "<ltx:bib-url href='#1'>Link or whatever</ltx:bib-url>");
1;

If you named it adsurl.sty.ltxml, then you could invoke it by adding --preload=adsurl.sty to the commandline. (or you could add the DefConstructor to some other binding, if you've already got one).

brucemiller commented 8 years ago

Or, maybe that's not the real problem...

brucemiller commented 8 years ago

Actually, you should change the "Semiverbatim" above to "Verbatim", then it should work.

dginev commented 8 years ago

Well that was anticlimactic.

brucemiller commented 8 years ago

Yes, perhaps so; it'd be nice if we could automatically know which non-standard fields held URL's, but searching for the string "url" doesn't seem sensible.

It's probably worth doing a git pull, since there have been some subtle fixes to verbatim, %, and so on. But I guess I'll go ahead and close this. If you need help in figuring out how/where to put that code snippet, either re-open the issue, or ask us on the mailing list.

Thanks for the report.

goska commented 8 years ago

Thanks. I'll see how far I can get with the information you have given so far. As I have not developed anything LaTeXML-related so far (apart from simple css and XSLT sheet modifications), I may need more hints.

goska commented 8 years ago

Thanks! I have implemented a binding for adsurl using your code (with "Semiverbatim"), which works and produces nice hyperlinks to ADS pages in bibliography in the HTML. I have called the binding ads_support.ltxml and I used it to convert bib file to XML using the following: $ latexml --dest=mybib.bib.xml --preload=ads_support.ltxml --preload=aa.cls.ltxml mybib.bib I am not sure which package it should belong to.