Closed uschindler closed 8 years ago
Here is our code to auto-generate a useful BIBTEX identifier from author name and year + escaping the characters to be correct Latex code:
public static final String generateBibIdentifier(String author, String year, String title) {
StringBuilder sb=new StringBuilder();
for (int i=0,c=author.length(); i<c; i++) {
char ch=Character.toLowerCase(author.charAt(i));
if (ch>='a' && ch<='z') sb.append(ch);
}
sb.append(year.trim());
int j=0; boolean start=true;
for (int i=0,c=title.length(); i<c; i++) {
char ch=Character.toLowerCase(title.charAt(i));
start|=(ch==' ');
if (start && ch>='a' && ch<='z') {
sb.append(ch);
j++; start=false;
}
if (j>=4) break;
}
return sb.toString();
}
public static final String escapeLatex(String text) {
StringBuilder sb=new StringBuilder(text.length());
boolean nl=false;
for (int i=0,c=text.length(); i<c; i++) {
char ch=text.charAt(i);
if (ch!=13 && ch!=10 && nl) {
sb.append("\\\\\n");
nl=false;
}
switch (ch) {
case '\u00E4': sb.append("{\\\"a}"); break;
case '\u00F6': sb.append("{\\\"o}"); break;
case '\u00FC': sb.append("{\\\"u}"); break;
case '\u00EB': sb.append("{\\\"e}"); break;
case '\u00EF': sb.append("{\\\"i}"); break;
case 196: sb.append("{\\\"A}"); break;
case 214: sb.append("{\\\"O}"); break;
case 220: sb.append("{\\\"U}"); break;
case 203: sb.append("{\\\"E}"); break;
case 207: sb.append("{\\\"I}"); break;
case 225: sb.append("{\\'a}"); break;
case 243: sb.append("{\\'o}"); break;
case 250: sb.append("{\\'u}"); break;
case 233: sb.append("{\\'e}"); break;
case 237: sb.append("{\\'i}"); break;
case 224: sb.append("{\\`a}"); break;
case 242: sb.append("{\\`o}"); break;
case 249: sb.append("{\\`u}"); break;
case 232: sb.append("{\\`e}"); break;
case 236: sb.append("{\\`i}"); break;
case 226: sb.append("{\\^a}"); break;
case 244: sb.append("{\\^o}"); break;
case 251: sb.append("{\\^u}"); break;
case 234: sb.append("{\\^e}"); break;
case 238: sb.append("{\\^i}"); break;
case 194: sb.append("{\\^A}"); break;
case 212: sb.append("{\\^O}"); break;
case 219: sb.append("{\\^U}"); break;
case 202: sb.append("{\\^E}"); break;
case 206: sb.append("{\\^I}"); break;
case 227: sb.append("{\\~a}"); break;
case 241: sb.append("{\\~n}"); break;
case 245: sb.append("{\\~o}"); break;
case 195: sb.append("{\\~A}"); break;
case 209: sb.append("{\\~N}"); break;
case 213: sb.append("{\\~O}"); break;
case '\u00DF': sb.append("{\\ss}"); break;
case '\u00A0': sb.append('~'); break; //
case '\u00BA': sb.append("{\\textdegree}"); break;
case '"': sb.append("{\"}"); break;
case 13:
case 10:
nl=true;
break;
case '\'':
case '\u00B4':
case '`':
sb.append("{\'}"); break;
// simple escapes:
case '\\':
case '~':
case '$':
case '%':
case '^':
case '&':
case '{':
case '}':
case '_':
sb.append('\\');
sb.append(ch);
break;
default:
sb.append( (ch<0x80)?ch:'?' );
}
}
return sb.toString();
}
Here is an example BIBTEX output from PANGAEA (full featured PANGAEA supplement), generated by this code: http://doi.pangaea.de/10.1594/PANGAEA.94417?format=citation_bibtex
The dataset is: http://doi.pangaea.de/10.1594/PANGAEA.94417
Is the Pangaea code still useful, or does the Pangaea implementation look very different now?
No changes here!
Thanks! Added latex escaping to bibtex output. Also added an abstract field.
As citation key I prefer to use the the DOI expressed as URL. Better than the UUID that changes on every request.
Thanks. When reviewing our changes I found a small change added later:
The fix is here:
Index: XSLTFunctions.java
===================================================================
--- XSLTFunctions.java (revision 4295)
+++ XSLTFunctions.java (revision 4296)
@@ -5,6 +5,7 @@
import java.io.StringReader;
import java.io.StringWriter;
import java.net.URL;
+import java.util.Locale;
import java.util.regex.Pattern;
import javax.xml.parsers.DocumentBuilderFactory;
@@ -123,7 +124,8 @@
case '\u00DF': sb.append("{\\ss}"); break;
case '\u00A0': sb.append('~'); break; //
- case '\u00BA': sb.append("{\\textdegree}"); break;
+ case '\u00BA':
+ case '\u00B0': sb.append("{\\textdegree}"); break;
case '"': sb.append("{\"}"); break;
case 13:
@@ -146,11 +148,14 @@
case '{':
case '}':
case '_':
- sb.append('\\');
- sb.append(ch);
+ sb.append('\\').append(ch);
break;
default:
- sb.append( (ch<0x80)?ch:'?' );
+ if (ch<0x80) {
+ sb.append(ch);
+ } else {
+ sb.append(String.format(Locale.ROOT, "{\\char\"%04X}", Integer.valueOf(ch)));
+ }
}
}
return sb.toString();
I am not sure if Bibtex allows to use an URI as identifier for citations. Our code creates a typical bibtex styled reference identifier using the author name and the year and title chars, e.g., "maturilli2016baom" for this one:
@misc{maturilli2016baom,
author={Marion {Maturilli} and Christoph {Ritter}},
title={{Basic and other measurements of radiation at station Ny-{\char"00C5}lesund (2015-03)}},
year={2016},
doi={10.1594/PANGAEA.854326},
url={https://doi.pangaea.de/10.1594/PANGAEA.854326},
note={Supplement to: Maturilli, Marion; Ritter, Christoph (2016): Surface radiation during the total solar eclipse over Ny-{\char"00C5}lesund, Svalbard, on 20 March 2015. Earth System Science Data, 8(1), 159-164, doi:10.5194/essd-8-159-2016},
abstract={On 20 March 2015, a total solar eclipse occurred over Ny-{\char"00C5}lesund (78.9{\textdegree} N, 11.9{\textdegree} E), Svalbard, in the high Arctic. It was the first time that the surface radiation components during the totality of a solar eclipse were measured by a Baseline Surface Radiation Network (BSRN) station. With the Ny-{\char"00C5}lesund long-term radiation data set as background (available at doi:10.1594/PANGAEA.150000), we present here the peculiarities of the radiation components and basic meteorology observed during the eclipse event. The supplementary data set contains the basic BSRN radiation and surface meteorological data in 1 min resolution for March 2015, and is available at doi:10.1594/PANGAEA.854326. The eclipse radiation data will be a useful auxiliary data set for further studies on micrometeorological surface-atmosphere exchange processes in the Svalbard environment, and may serve as a test case for radiative transfer studies.},
type={data set},
publisher={PANGAEA}
}
OK, will look into URI as bibtex key. I have used them for a while now in internal projects, e.g. for the DataCite blog (where in-text citations generated by Pandoc are made actionable using the key).
No problem, I was just not sure if it works with Bibtex/Latex at all. I was under the assumption that it only allows "identifier-like" stuff. Our keys are just formed according to the "convention" in Latex world:
I have no reference where this is defined it is just the way how publishers and software tools produce their files! :-)
The Latex escapes a lot of common characters, so importing the BIBTEX files from content-resolver in most cases lead to Latex errors. The rules for escaping characters are very complicated.
At PANGAEA we have an escaper class for BIBTEX that handles most of western chars to be correctly escaped when exported as Latex text (used by BIBTEX). We can provide this Java code here, it should be available to everyone. Its mainly a POJO with a static method that gets a String and returns the String as escaped Latex code.