ARM-software / acle

Arm C Language Extensions (ACLE)
Other
85 stars 54 forks source link

[PDF] Fix searching and copy-pasting underscore characters #287

Closed atrosinenko closed 7 months ago

atrosinenko commented 8 months ago

In the existing setup, the *.md source files are converted to PDF by pandoc that invokes pdflatex internally. With the default font encoding, underscore characters inside paragraphs of text look like whitespace (or absent) in the produced PDF documents w.r.t. copy-pasting text from PDF viewer or searching. This may confuse users as it makes __ARM_FEATURE_name and long_function_name strings invisible to the "Search ..." function of a viewer, but only if they are not inside a standalone block of code.

One of the solutions is to use T1 font encoding and ensure that Type 1 fonts are available (i.e. pdflatex does not have to use rasterized Type 3 fonts).

Fixes #282.


name: Pull request about: Technical issues, document format problems, bugs in scripts or feature proposal.


Thank you for submitting a pull request!

If this PR is about a bugfix:

Please use the bugfix label and make sure to go through the checklist below.

If this PR is about a proposal:

We are looking forward to evaluate your proposal, and if possible to make it part of the Arm C Language Extension (ACLE) specifications.

We would like to encourage you reading through the contribution guidelines, in particular the section on submitting a proposal.

Please use the proposal label.

As for any pull request, please make sure to go through the below checklist.

Checklist: (mark with X those which apply)

atrosinenko commented 8 months ago

There turned out to be a known issue with expressing underscores in PDFs produced by pdflatex (for example, this question).

Considering possible regressions, this unofficial documentation for fontenc package mentions that switching font encoding may force pdflatex to use raster fonts (are they always Type 3 fonts?). Additionally, in this PR fontenc is loaded as early as possible, but it may be required to load some font-related packages before it. I am not sure if this is the case for inconsolata package - I visually compared a few pages of "old" and "new" version and it looks like the monospaced font have changed (at least the underscore characters) but moving \usepackage{inconsolata} just before the \usepackage[T1]{fontenc} line (in addition to this patch) seems to change nothing.

The PDF documents generated with this patch applied look visually correct and can be searched for identifier names containing underscore characters (though, the layout changed a bit). No new Type 3 fonts are listed in the output of pdffonts, but in the "new" version less fonts are listed.

atrosinenko commented 8 months ago

Just in case, here is the output of pdffonts utility for the generated documents:

Without this patch ``` === acle.pdf === name type encoding emb sub uni object ID ------------------------------------ ----------------- ---------------- --- --- --- --------- HCPDYJ+Lato-Regular Type 1 Custom yes yes no 1782 0 NQPXVC+Lato-Bold Type 1 Custom yes yes no 1784 0 KIYOWP+Lato-Italic Type 1 Custom yes yes no 1786 0 SYFPBV+CMMI10 Type 1 Builtin yes yes no 1842 0 YPHFQB+Inconsolatazi4-Regular Type 1 Custom yes yes no 1843 0 HCPDYJ+Lato-Regular Type 1 Custom yes yes no 2320 0 UABGXL+CMSY10 Type 1 Builtin yes yes no 2539 0 ADRRSK+Inconsolatazi4-Bold Type 1 Custom yes yes no 2540 0 KIYOWP+Lato-Italic Type 1 Custom yes yes no 2550 0 ZGGNQH+CMMI12 Type 1 Builtin yes yes no 2625 0 YPHFQB+Inconsolatazi4-Regular Type 1 Custom yes yes no 2733 0 F69 Type 3 Custom yes no no 3076 0 === advsimd.pdf === name type encoding emb sub uni object ID ------------------------------------ ----------------- ---------------- --- --- --- --------- LXZGMB+Lato-Regular Type 1 Custom yes yes no 1280 0 KQVPHX+Lato-Bold Type 1 Custom yes yes no 1282 0 DMJYBT+Lato-Italic Type 1 Custom yes yes no 1284 0 SYFPBV+CMMI10 Type 1 Builtin yes yes no 1490 0 LXZGMB+Lato-Regular Type 1 Custom yes yes no 1683 0 LHGFCN+Inconsolatazi4-Regular Type 1 Custom yes yes no 1695 0 ZGGNQH+CMMI12 Type 1 Builtin yes yes no 5233 0 === cmse.pdf === name type encoding emb sub uni object ID ------------------------------------ ----------------- ---------------- --- --- --- --------- BLIBLO+Lato-Regular Type 1 Custom yes yes no 271 0 BLIBLO+Lato-Regular Type 1 Custom yes yes no 272 0 OBTHMW+Lato-Bold Type 1 Custom yes yes no 274 0 OBTHMW+Lato-Bold Type 1 Custom yes yes no 275 0 VEHMTU+Lato-Italic Type 1 Custom yes yes no 277 0 SNQNWU+Inconsolatazi4-Regular Type 1 Custom yes yes no 401 0 LOWLTO+CMSY10 Type 1 Builtin yes yes no 489 0 YSBLIL+Inconsolatazi4-Bold Type 1 Custom yes yes no 490 0 SYFPBV+CMMI10 Type 1 Builtin yes yes no 571 0 EURBLZ+DejaVuSans TrueType WinAnsi yes yes yes 695 0 SPRGBG+DejaVuSans TrueType WinAnsi yes yes yes 760 0 === morello.pdf === name type encoding emb sub uni object ID ------------------------------------ ----------------- ---------------- --- --- --- --------- QKJZDY+Lato-Regular Type 1 Custom yes yes no 118 0 AIUTIA+Lato-Bold Type 1 Custom yes yes no 120 0 IKKTZB+Lato-Italic Type 1 Custom yes yes no 122 0 UVILRV+Inconsolatazi4-Regular Type 1 Custom yes yes no 157 0 QKJZDY+Lato-Regular Type 1 Custom yes yes no 205 0 CKAXET+Inconsolatazi4-Bold Type 1 Custom yes yes no 230 0 === mve.pdf === name type encoding emb sub uni object ID ------------------------------------ ----------------- ---------------- --- --- --- --------- TMPYZL+Lato-Regular Type 1 Custom yes yes no 407 0 RKHJOK+Lato-Bold Type 1 Custom yes yes no 409 0 FTKNFF+Lato-Italic Type 1 Custom yes yes no 411 0 TMPYZL+Lato-Regular Type 1 Custom yes yes no 568 0 URWTTH+Inconsolatazi4-Regular Type 1 Custom yes yes no 569 0 ```
With this patch applied ``` === acle.pdf === name type encoding emb sub uni object ID ------------------------------------ ----------------- ---------------- --- --- --- --------- HGEBLM+Lato-Regular Type 1 Custom yes yes no 1782 0 WKINJQ+Lato-Bold Type 1 Custom yes yes no 1784 0 KIYOWP+Lato-Italic Type 1 Custom yes yes no 1786 0 UPVBQN+Inconsolatazi4-Regular Type 1 Custom yes yes no 1842 0 HGEBLM+Lato-Regular Type 1 Custom yes yes no 2319 0 BMTUMN+Inconsolatazi4-Bold Type 1 Custom yes yes no 2538 0 KIYOWP+Lato-Italic Type 1 Custom yes yes no 2548 0 UPVBQN+Inconsolatazi4-Regular Type 1 Custom yes yes no 2729 0 F69 Type 3 Custom yes no no 3072 0 === advsimd.pdf === name type encoding emb sub uni object ID ------------------------------------ ----------------- ---------------- --- --- --- --------- CBTKCX+Lato-Regular Type 1 Custom yes yes no 1280 0 GKMNFC+Lato-Bold Type 1 Custom yes yes no 1282 0 DMJYBT+Lato-Italic Type 1 Custom yes yes no 1284 0 CBTKCX+Lato-Regular Type 1 Custom yes yes no 1682 0 LHGFCN+Inconsolatazi4-Regular Type 1 Custom yes yes no 1694 0 === cmse.pdf === name type encoding emb sub uni object ID ------------------------------------ ----------------- ---------------- --- --- --- --------- IOXFSS+Lato-Regular Type 1 Custom yes yes no 271 0 IOXFSS+Lato-Regular Type 1 Custom yes yes no 272 0 OBTHMW+Lato-Bold Type 1 Custom yes yes no 274 0 OBTHMW+Lato-Bold Type 1 Custom yes yes no 275 0 VEHMTU+Lato-Italic Type 1 Custom yes yes no 277 0 QVEWWD+Inconsolatazi4-Regular Type 1 Custom yes yes no 401 0 YSBLIL+Inconsolatazi4-Bold Type 1 Custom yes yes no 489 0 EURBLZ+DejaVuSans TrueType WinAnsi yes yes yes 693 0 SPRGBG+DejaVuSans TrueType WinAnsi yes yes yes 758 0 === morello.pdf === name type encoding emb sub uni object ID ------------------------------------ ----------------- ---------------- --- --- --- --------- QKJZDY+Lato-Regular Type 1 Custom yes yes no 118 0 AIUTIA+Lato-Bold Type 1 Custom yes yes no 120 0 IKKTZB+Lato-Italic Type 1 Custom yes yes no 122 0 UVILRV+Inconsolatazi4-Regular Type 1 Custom yes yes no 157 0 QKJZDY+Lato-Regular Type 1 Custom yes yes no 205 0 GQBMYK+Inconsolatazi4-Bold Type 1 Custom yes yes no 230 0 === mve.pdf === name type encoding emb sub uni object ID ------------------------------------ ----------------- ---------------- --- --- --- --------- MPFLDN+Lato-Regular Type 1 Custom yes yes no 407 0 RKHJOK+Lato-Bold Type 1 Custom yes yes no 409 0 FTKNFF+Lato-Italic Type 1 Custom yes yes no 411 0 MPFLDN+Lato-Regular Type 1 Custom yes yes no 568 0 URWTTH+Inconsolatazi4-Regular Type 1 Custom yes yes no 569 0 ```

Another relevant link: https://tex.stackexchange.com/questions/345866/when-should-package-fontenc-be-used-with-pdflatex

vhscampos commented 7 months ago

Hi, thanks for your Pull Request. We aim to review it as soon as possible.

vhscampos commented 7 months ago

LGTM. Thanks for the analysis and the fix!

vhscampos commented 7 months ago

@all-contributors please add @atrosinenko for code.

allcontributors[bot] commented 7 months ago

@vhscampos

I've put up a pull request to add @atrosinenko! :tada: