RunasSudo / gfx2gfx-pdftext

A fork of SWFTools' gfx2gfx which preserves text, rather than converting to shapes.
GNU General Public License v2.0
9 stars 5 forks source link

Text does not display in some viewers #2

Closed RunasSudo closed 7 years ago

RunasSudo commented 7 years ago

This seems to be a compatibility issue between viewers. In some viewers, including Adobe Reader, MuPDF, xpdf and Firefox's pdf.js, text does not display. qpdf, evince, okular and Google Drive view the PDFs correctly.

A temporary fix appears to be to post-process the PDF with Ghostscript or Poppler:

gs -o output.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress input.pdf

or

pdftocairo -pdf input.pdf output.pdf

This produces a PDF which appears to be readable in all the above applications.

RunasSudo commented 7 years ago

This appears to be limited to SWF fonts. Using a TTF font does not cause this issue.

RunasSudo commented 7 years ago

The problem appears to be related to the use of the custom encoding. Using "host" encoding allows the PDF to display correctly in Adobe Reader (though notably, not in xpdf), while using the custom encoding fails irrespective of embedding.

RunasSudo commented 7 years ago

Looks like it was a simple matter of Adobe Reader not liking the glyph names, combined with our misuse of the unicode field of glyphs.

sonst-was commented 7 years ago

Hi there,

I think I have to ask you to reopen this issue. I just tried gfx2gfx and I think I have the same issues which are described here (although I dont know if this repo is still somehow maintained).

The attached pdf is display in the following ways: onedrive: http://imgur.com/a/JRMuW (looks a bit like modern art) Adobe Acrobat: http://imgur.com/a/2tPEx (same with Microsoft Edge)

SumatraPDF, Chrome, xpdf and the standard document viewer on Linux Mint display the pdf correctly. Post-processing the pdfs with ghostscript or poppler didnt change anything. The only "solution" so far which worked for me was printing the pdf with SumatraPDF using the CutePDf Writer. But afterwards the pdf contains only an image (which then is correctly displayed by Arcobat).

Acrobat displays multiple error messages (rough translations): "a number is out of the valid range" (twice) and "the embedded font 'HTAVNI+font6' couldn't be extracted, some character might not be disyplayed correct".

Greetings

page_3.pdf

RunasSudo commented 7 years ago

Hi @sonst-was, could you post the output of gfx2gfx -V, and the SWF file that causes the issue? I am unable to reproduce this issue with my own SWFs to that extent.

sonst-was commented 7 years ago

Hi @RunasSudo, thanks for the fast response.

The gfx2gfx -V output is:

gfx2gfx-pdf2text - part of swftools 0.9.2 (build )

I've uploaded the corresponding swf file here: https://app.box.com/s/v6pagokjt8punc9ly9k75y5c09i65hcp because swf files can't be uploaded on github.

RunasSudo commented 7 years ago

Thanks, I can reproduce the issue with that file you linked. I've opened a new issue for this at #6