coolwanglu / pdf2htmlEX

Convert PDF to HTML without losing text or format.
http://coolwanglu.github.com/pdf2htmlEX/
Other
10.38k stars 1.84k forks source link

Huge chunk of javascript code in HTML output #593

Open StijnVanLoo opened 8 years ago

StijnVanLoo commented 8 years ago

I used the following command:

pdf2htmlEX manual_old.pdf --split-pages 1 --dest-dir new --page-filename m%d.html --fit-width 800

When converting to HTML I get several warnings that say something like "Mark Positioning has an offset bigger than 65535 bytes. This means FontForge must use an extension ..."

As a result (I think?) the html output show a big chunk of javascript code, as seen here: https://www.e-capture.net/docs/temp/manual_old.html (You need to scroll all the way down using the outermost scrollbar)

Would appreciate some guidance. Thanks in advance

StijnVanLoo commented 8 years ago

Any ideas what might be wrong here?

StijnVanLoo commented 8 years ago

Bump

coolwanglu commented 8 years ago

It seems that you want to embed jquery code into html, which might be too long for the parser? Try to use external js files maybe?

StijnVanLoo commented 8 years ago

Thanks for the reply Lu. I'm not embedding jquery at all, I'm simply trying to convert a PDF (coming from Word file) into HTML. Used to work fine in the past, but on Windows 10 I het tis issue. Any other thoughts?

coolwanglu commented 8 years ago

I saw /*! jQuery v@1.8.1 jquery.com | jquery.org/license */ in the HTML, and I don't know where it comes from. Can you attach the manifest file you are using? By default it is in the data-dir, which can be found in the output of pdf2htmlEX -v