coolwanglu / pdf2htmlEX

Convert PDF to HTML without losing text or format.
http://coolwanglu.github.com/pdf2htmlEX/
Other
10.39k stars 1.84k forks source link

Segmentation fault when trying to use --auto-hint 1 #626

Open rsimmonsnim opened 8 years ago

rsimmonsnim commented 8 years ago

Great tool! I could use help with this issue though: I have a pdf file that is fine on Mac, but on Ubuntu 12.04.5 LTS (GNU/Linux 3.2.0-101-generic x86_64) I get a segmentation fault but only when using this option (--auto-hint 1), any ideas? Also runs ok on ubuntu if I don't try to process all 110 pages, seems like a real head scratcher. The results are better with --auto-hint 1, so I'd like to keep it if I can.

Here's my command line: pdf2htmlEX --tounicode 0 --optimize-text 1 --space-as-offset 0 --embed cfijo --process-type3=1 --correct-text-visibility 1 --process-nontext 1 --external-hint-tool=ttfautohint --auto-hint 1 --split-pages=1 --auto-hint 1 --hdpi 225 --vdpi 225 --dest-dir $outDir $inFile

Mac (OK): pdf2htmlEX version 0.13.6 Copyright 2012-2014 Lu Wang coolwanglu@gmail.com and other contributors Libraries: poppler 0.36.0 libfontforge 20150924 cairo 1.14.2 Default data-dir: /usr/local/Cellar/pdf2htmlex/0.13.6_3/share/pdf2htmlEX Supported image format: png jpg svg ttfautohint v1.4

Ubuntu (seg fault with flag): pdf2htmlEX version 0.14.6 Copyright 2012-2015 Lu Wang coolwanglu@gmail.com and other contributors Libraries: poppler 0.33.0 libfontforge 20160331 cairo 1.14.3 Default data-dir: /usr/local/share/pdf2htmlEX Supported image format: png jpg svg ttfautohint v1.4

rsimmonsnim commented 8 years ago

Happens with many pdfs, here's the last screen of lines from debug output:

You will get better instructions if you fill in the Private dictionary, Element->Font Info->Private, for the font Install font 25b: (3243 0) GKWQNN+MuseoSlab-300-Identity-H Add new temporary file: /tmp/pdf2htmlEX-ztb8S6/f25b.pfa Embed font: /tmp/pdf2htmlEX-ztb8S6/f25b.pfa 603 Add new temporary file: /tmp/pdf2htmlEX-ztb8S6/raw_font_25b.pfa em size: 1000 Add new temporary file: /tmp/pdf2htmlEX-ztb8S6/f25b.map Missing space width in font 25b: set to 0.001 space width: 0.001 No glyph for a standard character to derive standard width and height. Please check the documentation for a list of script-specific standard characters, or use option `--symbol'. You will get better instructions if you fill in the Private dictionary, Element->Font Info->Private, for the font Install font 25c: (3244 0) HKWQNN+FranklinGothic-DemiCond Add new temporary file: /tmp/pdf2htmlEX-ztb8S6/f25c.pfa Embed font: /tmp/pdf2htmlEX-ztb8S6/f25c.pfa 604 Add new temporary file: /tmp/pdf2htmlEX-ztb8S6/raw_font_25c.pfa em size: 1000 Add new temporary file: /tmp/pdf2htmlEX-ztb8S6/f25c.map space width: 0.187 Install font 25d: (3245 0) IKWQNN+TimesNewRomanPSMT Add new temporary file: /tmp/pdf2htmlEX-ztb8S6/f25d.pfa Embed font: /tmp/pdf2htmlEX-ztb8S6/f25d.pfa 605 Add new temporary file: /tmp/pdf2htmlEX-ztb8S6/raw_font_25d.pfa em size: 1000 Add new temporary file: /tmp/pdf2htmlEX-ztb8S6/f25d.map space width: 0.25 Install font 25e: (3246 0) JKWQNN+TimesNewRomanPS-BoldMT Add new temporary file: /tmp/pdf2htmlEX-ztb8S6/f25e.pfa Embed font: /tmp/pdf2htmlEX-ztb8S6/f25e.pfa 606 Add new temporary file: /tmp/pdf2htmlEX-ztb8S6/raw_font_25e.pfa em size: 1000 Add new temporary file: /tmp/pdf2htmlEX-ztb8S6/f25e.map space width: 0.25

coolwanglu commented 8 years ago

Although not 100% sure, it looks like a crash in FontForge. Can try this?