coolwanglu / pdf2htmlEX

Convert PDF to HTML without losing text or format.
http://coolwanglu.github.com/pdf2htmlEX/
Other
10.37k stars 1.84k forks source link

Some improvements for consideration from the Debian ITP #385

Closed josch closed 10 years ago

josch commented 10 years ago

Hi,

I'm currently packaging pdf2htmlEX for Debian. You can see the current status here: http://mentors.debian.net/package/pdf2htmlex or download the package directly with

dget http://mentors.debian.net/debian/pool/main/p/pdf2htmlex/pdf2htmlex_0.11+ds-1.dsc

Some of the patches I applied to version 0.11 might be worth including in pdf2htmlEX so that they can be dropped from the Debian packaging. You can find the patches in the debian/patches directory after unpackaging the source tarball I linked to above.

Let me list which patches might be interesting to you:

fix-manpage-hyphens - please apply this patch since the current manpage writes a hyphen where a minus sign is intended.

fix-spelling - these are just some fixes of wrong spellings

poppler-build-fix - I backported this onto 0.11 from commit c0371a07a678bebf2e6991c94eb245ec1c3f95cf

no-install-LICENSE - the extra license file should not be installed in the Debian case. You can probably ignore this.

shell-set-ex - I added -ex to the shebang of the shell scripts because the build should fail if any part of the script fails executing. Also the build should be verbose to easily inspect it.

use-rsvg - in logo/update_png.sh you use imagemagick's convert utility to convert from svg to png. But convert will use rsvg-convert in the background so there is no need to use convert because it adds a big additional build-time dependency.

use-system-pdfjs - instead of using 3rdparty/PDF.js/compatibility.js, the packaging uses compatibility.js from an installed libpdfjs package. You might want to change compatibility.js into a symlink to an already installed version in case the build system detects it.

use-system-yui-compressor - instead of using 3rdparty/closure-compiler/compiler.jar, use yui-compressor from the system. You might want to auto detect whether yui-compressor is installed on the system and then use that instead.

Apart from these patches I also repacked your release tarball and removed the following from it:

Since the Debian packaging uses the installed versions of PDF.js and yui-compressor, there is no need for the 3rdparty directory anymore. The build also regenerates the png images in the logo directory so there is no need to ship them either.

josch commented 10 years ago

I found another problem. Even if a test in the test suite fails, running make test does not fail. This is fixed in the patch named fix-testsuite.

josch commented 10 years ago

Another two problems I solved:

do-not-ship-unused-js - this patch does not install the unminified versions of base.css, fancy.css and pdf2htmlEX.js which are never used by the executable.

use-tmpdir - on non-windows system, /tmp is used for temporary files. Instead, /tmp should be the last ressort when no other method is able to return an appropriate temporary directory. So this patch also check the TMPDIR environment variable and some other places.

coolwanglu commented 10 years ago

Hi @josch , thank you very much for your patches!

I picked a few patches, and added you into the AUTHORS file. However it could be easier for us to discuss if you created pull requests.

I'd like to explain a few of them I didn't apply:

use-rsvg and the png files: right, technically speaking the svg files are source files and png files are generated from them. But these are logo files, which are unlikely to be changed in the future. I just included the svg file and the script for my (and possibly others') convenience. So I don't think it's that necessary to generate png after installation. Another consideration is that I want the png to be official, what if users generate different png files?

pdf.js: compatibility.js is indeed from PDF.js, but I removed something from it that are not necessary here. I think it's better to keep this copy to make sure that it's always compatible with pdf2htmlEX.

closure vs yui-compressor: it might look weird since I'm including both of them. But closure seemed to be better than yui-compressor on JS, yet it cannot handle CSS. So I'm using closure for JS and yui for CSS.

unused js file: some supporting JS files are minimized before included in the output html file. But in HTML, it's very likely that the user want to examine the content, or to make changes, which cannot be done on minimized files. That's why I'm shipping those original source files even if they appear not to be used. Especially since the files might be changing time to time, it'll be painful for the users to find the files if not shipped in the first place.

josch commented 10 years ago

Hi @coolwanglu , sorry I'm not very familiar with github. Would you like me to create pull requests for anything or is everything resolved? About compatibility.js: did you just remove things that were not necessary or did you remove things that made pdf2htmlEX not work in some situations?

coolwanglu commented 10 years ago

@josch It's fine, I've merged them :) Thanks.

I think using compatiblity.js from pdf.js would work, I only removed some lines not necessary, but the original version should still work.