Open golnazads opened 4 years ago
@aaccomazzi this is implemented for BibTex ABS. do I need to remove these tags for for example custom format unicode encoding. I am guessing it is a yes for latex encoding. If it is a yes for unicode, then I guess need to fix that for XML and fielded formats, right? thank you.
This is the situation with respect to encoding in our json fields (see e.g. 2016ApJ...818L..26F)
<SUB>
etc.When creating custom output, we recognize and support three basic encoding:
<
remains <
<
, >
, &
. The issue of markup for unicode encoding has never been formally defined in our documentation and I had to go check the code of classic to figure out what we are doing here. Turns out classic simply strips the markup: <SUB>
-> (empty string)
I feel that the unicode handling of markup done by classic is wrong, because we provide a separate formatting option to control the treatment of markup (%ZMarkup:{keep|strip}
), as documented here: http://adsabs.github.io/help/actions/export
So I'm in favor of passing through markup as it is, and let users customize the output via formatting options.
just for your information export has the option of markup keep|strip https://github.com/adsabs/export_service/blob/master/exportsrv/formatter/customFormat.py#L702. I can remove it if you want @aaccomazzi .
We should keep the markup option, this way users can control what they get or not get. So I think the adjustments to make for unicode encoding are:
<P /> => \n\n (new paragraph)
<BR /> => \n (newline)
&, >, < => &, >, <
%ZMarkup
settings
Alberto replied to You @Carolyn @Golnaz sorry, I neglected to let you know of this possible markup. Please translate
<P />
to blank lines and<BR />
to a newline when outputting in a non-XML format. I think this means that for bibtex it would be:<P /> => \\