Open JensHH opened 9 years ago
Could you try the latest git commit? It might have been fixed since 0.12.
I really would like to try it. My problem is I have windows and don't know how to compile the source and which libraries I need. Sorry, the last time I did this is 20 years ago (yes, I read the chapter about building). I also have some servers but I am not sure if my providers let me do this and what kind of *nix they use. And I have no knowlege about Linux itself. I have SSH, I can install gnuC on windows and knowlege about PHP and MySQL, so I'm not a beginner. I really would like to support this project because it is amazing and we have the chance to work without PDF, which would be a huge progress in Website programming, what is my business. I just need a bit help for the start.
@JensHH I see. I'll try to reproduce it on my machine.
When you try it, can you please test if you can convert all 132 of my PDF at once. The windows version is crashing without any comment. I can convert the first 131 pages and the last 3. But not all 132 or the last 4.
A friend has compiled the programm for a rasberry pi and it worked without crashing. Instead of the blocks there is now text (sometimes only a few letters) sometimes in the background. Is it possible to get only the background image without text?
Try --correct-text-visibility 1
, and read man for more info.
When I convert my PDF sometimes I can not see the text, because there is a box with the same color in the background image http://liedtke.it/pdf2htmlEX/out2/AM.html. When I convert it with mediafire.com http://www4.mediafire.com/conversion_server.php?9745&quickkey=2uge979qtidxawm&output=html&doc_type=d&metadata=0&page=131&initial=0×tamp=1432986621&version=113354&domain=mediafire.com there is the text in the background image, sometimes with "real" text sometimes without. Is this a question of which version I use or which parameter? Is there a way to get a clean background image, because in my case the text is ok.
There is a second problem with the last page. The pdf has 132 pages. If I convert just the last 2 pages the last page is ok. If I convert the last 3 pages the font doesn't fit correct 2 pages: http://liedtke.it/pdf2htmlEX/out2/AM.html 3 pages: http://liedtke.it/pdf2htmlEX/out3/AM.html.
You can find the original pdf here http://liedtke.it/pdf2htmlEX/AM.pdf I convert with: pdf2htmlEX.exe -f 130 -l 132 --fit-width 600 AM.pdf AM3.htm --bg-format jpg
I am using a windows version from http://soft.rubypdf.com/software/pdf2htmlex-windows-verion pdf2htmlEX version 0.12 Copyright 2012-2014 Lu Wang Libraries: poppler 0.26.3 libfontforge 20140516 cairo 1.12.14