BibTeX ABS export: trailing

golnazads commented 4 years ago

Alberto replied to You @Carolyn @Golnaz sorry, I neglected to let you know of this possible markup. Please translate  to blank lines and   to a newline when outputting in a non-XML format. I think this means that for bibtex it would be:  => \\

golnazads commented 4 years ago

@aaccomazzi this is implemented for BibTex ABS. do I need to remove these tags for for example custom format unicode encoding. I am guessing it is a yes for latex encoding. If it is a yes for unicode, then I guess need to fix that for XML and fielded formats, right? thank you.

aaccomazzi commented 4 years ago

This is the situation with respect to encoding in our json fields (see e.g. 2016ApJ...818L..26F)

abstract and title text have the basic HTML entities encoded (these are < > and &)
they may also have some markup in the form of  etc.

When creating custom output, we recognize and support three basic encoding:

HTML: In this case the entities and markup are kept as they are, so < remains <
Latex: in this case the entities and markup are translated according to html -> latex syntax
Unicode: In this case the entities are turned into their unicode equivalent, in this case it's just the three characters above which become <, >, &. The issue of markup for unicode encoding has never been formally defined in our documentation and I had to go check the code of classic to figure out what we are doing here. Turns out classic simply strips the markup:  -> (empty string)

I feel that the unicode handling of markup done by classic is wrong, because we provide a separate formatting option to control the treatment of markup (%ZMarkup:{keep|strip}), as documented here: http://adsabs.github.io/help/actions/export So I'm in favor of passing through markup as it is, and let users customize the output via formatting options.

golnazads commented 4 years ago

just for your information export has the option of markup keep|strip https://github.com/adsabs/export_service/blob/master/exportsrv/formatter/customFormat.py#L702. I can remove it if you want @aaccomazzi .

aaccomazzi commented 4 years ago

We should keep the markup option, this way users can control what they get or not get. So I think the adjustments to make for unicode encoding are:

 => \n\n (new paragraph)
 => \n (newline)
&, >, < => &, >, <
All other markup: controlled by %ZMarkup settings

adsabs / export_service

BibTeX ABS export: trailing <P /> #172