There is no cleanup action that allows converting (old) bibliographic data that is (still) formatted in LaTeX with Non-Unicode characters to Unicode aware LaTeX formatting (newer LaTeX engines (e.g. LaTeX2e) can now read most Unicode characters).
Current workarounds include converting to from LaTeX to Unicode and then back to LaTeX, while manually checking, if any characters were wrongly converted. This is inefficient and takes a long time.
This workaround is bothersome, because there are symbols that do not get converted when using LaTeXToUnicode and UnicodeToLaTeX cleanup actions (e.g. #3644) and there are other special symbols that SHOULD not get converted automatically, because multiple conversions are possible and users would need to take take manually (e.g. #8712)
Desired Solution:
Create cleanup action for "LaTeX to Unicode aware LaTeX".
Example workflow:
Have the following entry (BEFORE using the cleanup action):
@Article{Testkey,
author = {Testauthor},
title = {Bibliographic data that can be read by LaTeX engines},
a = {Here is a backslashed percentage sign \% and it should be excluded from conversion},
b = {Here is a \textcopyright{} and it should be converted to Unicode},
}
"Special Symbols" that would need to be excluded from conversion:
The list should be similar to the symbols mentioned in #8712.
At the very least Page 15 (Tables 1); Table 1 lists escapable special characters in LaTeX.
Maybe also Page 15 Table 2 and Page 16 Table 3.
There might be a lot more, but I am not knowledgable enough to list them here. If you know of any, just post it in this thread.
Additional Information
When working on this, The Comprehensive LATEX Symbol List will be of help. Especially chapters about "Unicode" (Page 272) and "Special Characters" (Page 15-16).
JabRef currently uses https://github.com/tomtung/latex2unicode; Maybe it can be adapted internally in JabRef (e.g. some pre-processing). Another solution would be to fork it or ask tomtung about creating a LaTeX2UnicodeAwareLaTeX converter.
Problem:
Desired Solution:
Example workflow:
Have the following entry (BEFORE using the cleanup action):
(Comment:
\textcopyright{}
can be converted to©
by the inputenc package. When using the LaTeX to Unicode aware LaTeX cleanup action, the result of the conversion should also be©
)Use cleanup action "LaTeX to Unicode aware LaTeX"
AFTER using the cleanup action, the following result should emerge:
"Special Symbols" that would need to be excluded from conversion:
Additional Information
Originally posted by @ThiloteE in https://github.com/JabRef/jabref/issues/8490#issuecomment-1107461770