Open thammegowda opened 2 years ago
So the proceedings template contains these lines, which are really specific to pdfLaTeX and shouldn't be used with the newer engines:
\usepackage{times}
\usepackage{latexsym}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
If I compile that on Overleaf, download the PDF, and check the fonts that are used with pdffonts
, I get this:
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
WFBODD+NimbusMonL-Regu Type 1 Custom yes yes yes 60 0
JKDAPA+NimbusRomNo9L-ReguItal Type 1 Custom yes yes yes 73 0
ARYBWX+NimbusRomNo9L-Medi Type 1 Custom yes yes yes 56 0
HPBZPC+NimbusRomNo9L-Regu Type 1 Custom yes yes yes 58 0
ALVSVR+NimbusSanL-Bold Type 1 Custom yes yes yes 59 0
So at least Overleaf uses the "Nimbus" fonts when including the "times" package. That makes me think that with XeLaTeX or LuaLaTeX, the above lines in the template should be replaced with:
\usepackage{fontspec}
\setmainfont{Nimbus Roman}
\setsansfont{Nimbus Sans}
\setmonofont{Nimbus Mono}
(EDIT: TeX Gyre Termes is probably better, since it's supposed to be the same but with more features.)
I think it would make sense to check in the .sty file which TeX engine is used, and modify the font-related commands accordingly. @davidweichiang Would it make sense if I tried to prepare a pull request for something like this?
This all sounds great. But we need to set it up so that it looks the same either way.
Also related if any pub chairs are still using it: https://github.com/yz-joey/ACLPUB/issues/7
XeLaTeX has a major disadvantage, which is that arXiv does not support it. So I don't think it can be made the default (yet). But I definitely agree with making it an option.
@thammegowda In your example, the one on the right is set in Computer Modern, not Times Roman. So something is wrong with the font setup.
The modifications I did to add some Unicode text was
enable babel
\usepackage[english]{babel} % English as the main language
\babelprovide[import]{hindi}
\babelprovide[import]{arabic}
\babelprovide[import]{kannada}
\babelfont[*devanagari]{rm}{Lohit Devanagari}
\babelfont[*arabic]{rm}{Noto Sans Arabic}
Paste some Arabic and Hindi text
Hindi: \foreignlanguage{hindi}{मानव अधिकारों की सार्वभौम घोषणा} Arabic: \foreignlanguage{arabic}{الإعلان العالمي لحقوق الإنسان
And switch compiler to XeLaTex, since PdfTex could not compile it.
Also, I had to comment out \pdfoutput=1
for XeLaTex
I didn't explicitly modify fonts for English/Latin. Is babel
import messing up default fonts for English? Sorry, I am not a *TeX pro. Here is my overleaf project for reference https://www.overleaf.com/project/61d4c64cbc3e72789d2de4bc
Well, I would say arXiv has a major disadvantage in that it doesn't support XeLaTeX/LuaLaTeX, but I can see how we should make sure to support it ;)
@thammegowda The default font is Computer Modern, to get the correct font for the current *ACL template, both \usepackage{times}
and \usepackage[T1]{fontenc}
are important.
@mbollmann I agree, and I hope arXiv realizes this shortcoming and makes an update.
Also, I have these two lines
\usepackage{times}
\usepackage[T1]{fontenc}
I didn't remove these two, but is XeLaTex using Computer Modern? That's surprising!
@thammegowda Ah, maybe it is overwritten by something else in your preamble then. I can't access your Overleaf project, it's restricted. Try to move the "times" import further down maybe?
@mbollmann
I think babel
package is causing the issue. If I move times
fontenc
and microtype
below the babel
, the fonts for latin look as intended, but Arabic and Hindi stop working (text doesn't even appear).
\usepackage[english]{babel} % English as the main language
\babelprovide[import]{hindi}
\babelprovide[import]{arabic}
\babelprovide[import]{kannada}
\babelfont[*devanagari]{rm}{Lohit Devanagari}
\babelfont[*arabic]{rm}{Noto Sans Arabic}
\usepackage{times}
\usepackage[T1]{fontenc}
\usepackage{microtype}
Here is a overleaf link: https://www.overleaf.com/read/vbyhzmssdkkb (worked for me in private/incognito) If we could share a working example with these text, it'd be very useful.
Hindi: मानव अधिकारों की सार्वभौम घोषणा
Arabic: الإعلان العالمي لحقوق الإنسان
@thammegowda Not an expert with Babel, but I think as soon as you use a \babelfont
, you need to define an explicit Latin font as well. I haven't found a way to get the exact same font as LaTeX's ptm
family (which is what "times" uses), but if you add
\babelfont{rm}{TeX Gyre Termes}
before you load the other, language-specific fonts, you get something virtually indistinguishable from it.
That works! Thanks.
I was just looking into whether there were efforts to move away from pdflatex
to make the ACL style files more Unicode friendly - Glad I found this issue thread. I have 2 suggestions, and can help with the migration in these respects:
Further decisions probably need to be made about sans-serif and monospaced fonts, but none that can't be solved with some research.
We have been using PdfLatex compiler/engine as the default, but as we know it isn't Unicode (non-Latin) friendly. Though the instructions suggest using XeLaTeX, the generated PDF looks different in many ways than PdfLatex's. For example (left: PdfLatex, right: XeLatex): Look at the nuances in fonts, section headings aren't as bold as PdfTex's in the left. I believe the font weight isn't exactly the same.
My request/suggestion: Move towards Unicode supported template as a way of encouraging NLP in non-Latin languages. Researchers working on non-Latin languages should also be able to paste qualitative examples (without some non-vector images), right? So, how about making Unicode supported template (i.e XeLatex) as the default?
If any one interested in testing unicode support of latex templates, here is a file having UDHR titles in hundreds of languages: udhr-title.txt
Thanks,