coolwanglu / pdf2htmlEX

Convert PDF to HTML without losing text or format.
http://coolwanglu.github.com/pdf2htmlEX/
Other
10.36k stars 1.84k forks source link

Inaccurate Output #267

Open sharemu opened 10 years ago

sharemu commented 10 years ago

i convert a pdf to html on an other computer . it happend "Internal Error: Attempt to output 2147483647 into a 16-bit field. it will be truncated and the file may not be useful. " the system alert a error window : the pdf2htmlEX.exe has stopped working. the windows info : Windows 企业版 Service Pack 1 32位操作系统 The version of Fontforge Message as : D:\test\pdf2htmlEX-v1.0-win32-static>pdf2htmlEX.exe -v pdf2htmlEX version 0.10 Copyright 2012,2013 Lu Wang coolwanglu@gmail.com and other contributers Libraries: poppler 0.24.1 libfontforge 20130820 Default data-dir: .\data Supported image format: png jpg

Style Error sample :http://yunpan.cn/QDYsy63aNz2Ns

coolwanglu commented 10 years ago

could you provide a small sample here?

sharemu commented 10 years ago

Hi,LU: thank you . but it is a samllest whict i had send to you . and other pdf document in my hand is bigger then 300M. the sample pdf i send to you before ,did you receive ?

--best for you!

Share.Mu

From: Lu Wang Date: 2013-12-20 12:38 To: coolwanglu/pdf2htmlEX CC: sharemu Subject: Re: [pdf2htmlEX] Internal Error: Attempt to output 2147483647 into a 16-bit field. it will be truncated and the file may not be useful (#267) could you provide a small sample here? — Reply to this email directly or view it on GitHub.

coolwanglu commented 10 years ago

As I mentioned earlier, please do no send files to me by email. It makes it difficult for me (and others) to track the issue. If the files are too big, can you please try to provide a minimized sample? For example you can extract only 1-2 necessary pages from it.

regards,

On Fri, Dec 20, 2013 at 1:47 PM, sharemu notifications@github.com wrote:

Hi,LU: thank you . but it is a samllest whict i had send to you . and other pdf document in my hand is bigger then 300M. the sample pdf i send to you before ,did you receive ?

--best for you!

Share.Mu

From: Lu Wang Date: 2013-12-20 12:38 To: coolwanglu/pdf2htmlEX CC: sharemu Subject: Re: [pdf2htmlEX] Internal Error: Attempt to output 2147483647into a 16-bit field. it will be truncated and the file may not be useful (#267) could you provide a small sample here? — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/267#issuecomment-30990255 .

sharemu commented 10 years ago

jggclx html,this is my convert result .please Remove the suffix. PNG

coolwanglu commented 10 years ago

Seems that the file is corrupt, I cannot open it on my machine. And actually the PDF file is needed for further diagnose.

sharemu commented 10 years ago

do you have Free upload save attachment sites recommended for me ?

--best for you!

Share.Mu

From: Lu Wang Date: 2013-12-20 16:27 To: coolwanglu/pdf2htmlEX CC: sharemu Subject: Re: [pdf2htmlEX] Internal Error: Attempt to output 2147483647 into a 16-bit field. it will be truncated and the file may not be useful (#267) Seems that the file is corrupt, I cannot open it on my machine. And actually the PDF file is needed for further diagnose. — Reply to this email directly or view it on GitHub.

coolwanglu commented 10 years ago

dropbox? or maybe baidu pan and similar services

On Fri, Dec 20, 2013 at 4:44 PM, sharemu notifications@github.com wrote:

do you have Free upload save attachment sites recommended for me ?

--best for you!

Share.Mu

From: Lu Wang Date: 2013-12-20 16:27 To: coolwanglu/pdf2htmlEX CC: sharemu Subject: Re: [pdf2htmlEX] Internal Error: Attempt to output 2147483647into a 16-bit field. it will be truncated and the file may not be useful (#267) Seems that the file is corrupt, I cannot open it on my machine. And actually the PDF file is needed for further diagnose. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/267#issuecomment-30995992 .

sharemu commented 10 years ago

baidu pan url as flow: source pdf: http://yunpan.cn/QDYAZNa3x6Hab after convert html: http://yunpan.cn/QDYsy63aNz2Ns

coolwanglu commented 10 years ago

Could you please just provide a smaller sample?

On Fri, Dec 20, 2013 at 5:05 PM, sharemu notifications@github.com wrote:

baidu pan url as flow: source pdf: http://yunpan.cn/QDYAZNa3x6Hab after convert html: http://yunpan.cn/QDYsy63aNz2Ns

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/267#issuecomment-30996892 .

sharemu commented 10 years ago

can you open the pdf file ?

sharemu commented 10 years ago

this is a smallest ....

coolwanglu commented 10 years ago

Not yet, the browser says I need 2 hours to download the file.

On Fri, Dec 20, 2013 at 5:10 PM, sharemu notifications@github.com wrote:

can you open the pdf file ?

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/267#issuecomment-30997072 .

coolwanglu commented 10 years ago

How many pages are there in the PDF?

On Fri, Dec 20, 2013 at 5:11 PM, Lu Wang coolwanglu@gmail.com wrote:

Not yet, the browser says I need 2 hours to download the file.

On Fri, Dec 20, 2013 at 5:10 PM, sharemu notifications@github.com wrote:

can you open the pdf file ?

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/267#issuecomment-30997072 .

sharemu commented 10 years ago

255

coolwanglu commented 10 years ago

I mean could you extract only 1-2 pages that cause the error? In that way the PDF could be much smaller I guess.

On Fri, Dec 20, 2013 at 5:13 PM, sharemu notifications@github.com wrote:

255

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/267#issuecomment-30997191 .

sharemu commented 10 years ago

this error is not on my computer , but on anther . it may be the operate sysytem case the error. this pdf is style error when i try to extract only 1-2 pages like : http://yunpan.cn/QDYsy63aNz2Ns on my computer

sharemu commented 10 years ago

you can download http://yunpan.cn/QDYfRRJgsxNDi

coolwanglu commented 10 years ago

Thank you very much!

Actually this error comes from Fontforge, so please have a check of the versions of Fontforge installed.

On Fri, Dec 20, 2013 at 5:19 PM, sharemu notifications@github.com wrote:

you can download http://yunpan.cn/QDYfRRJgsxNDi

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/267#issuecomment-30997483 .

sharemu commented 10 years ago

The version of Fontforge Message as : D:\test\pdf2htmlEX-v1.0-win32-static>pdf2htmlEX.exe -v pdf2htmlEX version 0.10 Copyright 2012,2013 Lu Wang coolwanglu@gmail.com and other contributers Libraries: poppler 0.24.1 libfontforge 20130820 Default data-dir: .\data Supported image format: png jpg

sharemu commented 10 years ago

this pdf document created by embedded fangzheng Font . not a general system Font

sharemu commented 10 years ago

i have font TTF file .but how can i config to libfontforge ?

coolwanglu commented 10 years ago

I'm not sure what you meant by 'config to libfontforge'. Fontforge should work automatically ideally.

On Fri, Dec 20, 2013 at 5:31 PM, sharemu notifications@github.com wrote:

i have font TTF file .but how can i config to libfontforge ?

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/267#issuecomment-30998077 .

sharemu commented 10 years ago

" so please have a check of the versions of Fontforge installed." do you say system font did't setup?

sharemu commented 10 years ago

but why i can open it normal in pdf reader?

coolwanglu commented 10 years ago

It should be a bug of pdf2htmlEX or fontforge.

On Fri, Dec 20, 2013 at 5:43 PM, sharemu notifications@github.com wrote:

but why i can open it normal in pdf reader?

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/267#issuecomment-30998711 .

sharemu commented 10 years ago

oh .... can you track this bug ? can you resolve it ?

coolwanglu commented 10 years ago

Yes, all the bugs in the issues tracker are meant to be tracked. Although I don't have a quick solution for now, I'll take a look at it when I have time.

On Fri, Dec 20, 2013 at 7:14 PM, sharemu notifications@github.com wrote:

oh .... can you track this bug ? can you resolve it ?

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/267#issuecomment-31003197 .

sharemu commented 10 years ago

there are font files : http://yunpan.cn/QDYxLVadfBgbB may be can give you some help

--best for you!

Share.Mu

From: Lu Wang Date: 2013-12-20 19:21 To: coolwanglu/pdf2htmlEX CC: sharemu Subject: Re: [pdf2htmlEX] Internal Error: Attempt to output 2147483647 into a 16-bit field. it will be truncated and the file may not be useful (#267) Yes, all the bugs in the issues tracker are meant to be tracked. Although I don't have a quick solution for now, I'll take a look at it when I have time.

On Fri, Dec 20, 2013 at 7:14 PM, sharemu notifications@github.com wrote:

oh .... can you track this bug ? can you resolve it ?

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/267#issuecomment-31003197 .

— Reply to this email directly or view it on GitHub.

coolwanglu commented 10 years ago

I downloaded the .zip file, but seems the archive is corrupted. Anyway I don't need the html or the fonts.

Instead can you provide a small PDF will only 1-2 pages?

sharemu commented 10 years ago

i have no small 1-2 pages pdf document. http://yunpan.cn/QDYfRRJgsxNDi is smallest document . may be ,it cased by your net transfer . i try to download myself whitch i upoaded . it unziped normal.

--best for you!

Share.Mu

From: Lu Wang Date: 2013-12-22 18:55 To: coolwanglu/pdf2htmlEX CC: sharemu Subject: Re: [pdf2htmlEX] Internal Error: Attempt to output 2147483647 into a 16-bit field. it will be truncated and the file may not be useful (#267) I downloaded the .zip file, but seems the archive is corrupted. Anyway I don't need the html or the fonts. Instead can you provide a small PDF will only 1-2 pages? — Reply to this email directly or view it on GitHub.

sharemu commented 10 years ago

you can download again : http://yunpan.cn/QDYfRRJgsxNDi

coolwanglu commented 10 years ago

I meant, why don't you extract 1-2 pages from that PDF?

On Mon, Dec 23, 2013 at 12:46 AM, sharemu notifications@github.com wrote:

you can try again download http://yunpan.cn/QDYfRRJgsxNDi

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/267#issuecomment-31090806 .

sharemu commented 10 years ago

oh ,i have tryed to extract 1-2 pages .the result is :http://yunpan.cn/QDYsy63aNz2Ns

coolwanglu commented 10 years ago

@sharemu Seems that you didn't understand, I want you to extract 1-2 pages, and do not convert them to HTML, I need the original PDF pages — All I need is a minimal PDF file for this bug. You can first determine which pages are causing this issue, by performing besection with the -f and -l option, then extract those PDF pages out.

sharemu commented 10 years ago

there is no minimal pdf File . I can't change pdf construction . because this pdf file is not a general , it is geted form publishing company .i need convert this type pdf file.

coolwanglu commented 10 years ago

why not, have you tried any pdf editor, like acrobat, pdftk etc?

sharemu commented 10 years ago

it can be opened by acrobat . and pdftk i did't try .

coolwanglu commented 10 years ago

I remember that you can extract pages with acrobat. If not, I'd suggest you try a few others.

On the other hand, if you nothing works, please just wait for a few more days, I need to find a better network.

sharemu commented 10 years ago

thank you .is so late now .go bed .keep contact.

coolwanglu commented 10 years ago

I cannot reproduce it.

You mentioned that the error occurs on one of your computers, but not on the other one. What are the versions of Fontforge installed?

sharemu commented 10 years ago

there are two problems: (1)Internal Error: Attempt to output 2147483647 into a 16-bit field. it will be truncated and the file may not be useful. this problem happend when i call pdf2htmlex use java on another computer .the system infomation : Windows 企业版 Service Pack 1 32位操作系统 The version of Fontforge Message as : D:\test\pdf2htmlEX-v1.0-win32-static>pdf2htmlEX.exe -v pdf2htmlEX version 0.10 Copyright 2012,2013 Lu Wang coolwanglu@gmail.com and other contributers Libraries: poppler 0.24.1 libfontforge 20130820 Default data-dir: .\data Supported image format: png jpg but it be called normal in commandline window.

(2)html style error . the style error happend on all computer . may be it is affect by origin pdf self .the source pdf document can downloaded from :http://yunpan.cn/QDYfRRJgsxNDi and the error style html whitch be conerted :http://yunpan.cn/QDYsy63aNz2Ns this problem is serious.

i don't know ,if i say clear.

--best for you!

Share.Mu

From: Lu Wang Date: 2013-12-23 08:55 To: coolwanglu/pdf2htmlEX CC: sharemu Subject: Re: [pdf2htmlEX] Internal Error: Attempt to output 2147483647 into a 16-bit field. it will be truncated and the file may not be useful. And html style error . (#267) I cannot reproduce it. You mentioned that the error occurs on one of your computers, but not on the other one. What are the versions of Fontforge installed? — Reply to this email directly or view it on GitHub.

sharemu commented 10 years ago

procedure screenshot it happend ToUnicode CMap is not valid and got dropped for font: 13

coolwanglu commented 10 years ago

You said the error occurred on one computer but not another, what are the version info on both ?

The message about ToUnicode is harmless, just notifying that the information stored in PDF is useless.

sharemu commented 10 years ago

you mean pdf2htmlex version? D:\test\pdf2htmlEX-v1.0-win32-static>pdf2htmlEX.exe -v pdf2htmlEX version 0.10 Copyright 2012,2013 Lu Wang coolwanglu@gmail.com and other contributers Libraries: poppler 0.24.1 libfontforge 20130820 Default data-dir: .\data Supported image format: png jpg

all computer use the same pdf2htmex component. http://yunpan.cn/QDMZagwiz5SWx

coolwanglu commented 10 years ago

If the versions are the same, it's weird that you see the error on one but not on the other. I've no idea about it right now.

On the other hand, I can reproduce the style issue.

sharemu commented 10 years ago

ye. so , i think it cased by operate system . "On the other hand, I can reproduce the style issue." , you can reproduce the style error? or you will restart style error issue?

coolwanglu commented 10 years ago

Yes, why ask me again?

On Mon, Dec 23, 2013 at 4:46 PM, sharemu notifications@github.com wrote:

ye. so , i think it cased by operate system . "On the other hand, I can reproduce the style issue." , you can reproduce the style error?

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/267#issuecomment-31109066 .

sharemu commented 10 years ago

if you can reproduce the style error, very good. i believe ,it is easy for you to resolve this problem.

coolwanglu commented 10 years ago

I think it'll take me quite some time. Will take a took.

On Mon, Dec 23, 2013 at 4:50 PM, sharemu notifications@github.com wrote:

if you can reproduce the style error, very good. i belive ,it is easy for you to resolve this problem.

— Reply to this email directly or view it on GitHubhttps://github.com/coolwanglu/pdf2htmlEX/issues/267#issuecomment-31109190 .

sharemu commented 10 years ago

oh, The sooner the better, I'm looking forward to. please !