Background Image Optimization

coolwanglu / pdf2htmlEX

Convert PDF to HTML without losing text or format.

http://coolwanglu.github.com/pdf2htmlEX/

Other

10.38k stars 1.84k forks source link

Background Image Optimization #132

Closed coolwanglu closed 11 years ago

coolwanglu commented 11 years ago

Don't generate the image at all when the background is empty
Split the background into small pieces when there are large blank areas
- Start with code provided by @iclems
More suffixes, more options (e.g. compression)

micred commented 11 years ago

First optimization is great! But I'm not sure splitting is always the best, I agree with your post https://github.com/coolwanglu/pdf2htmlEX/issues/104#issuecomment-15772739 , since in most case is faster to download a slightly bigger file, than a pletora of small files.

WEBP offers better results but it's only supported by chrome at the moment.

PNG-8 with palette optimization and interlacing disabled could provide excellent results on files with simple drawings (like papers) but it's poor on images (since it only support 256 colors). An automatic optimization could be done with optipng (http://optipng.sourceforge.net/pngtech/optipng.html).

coolwanglu commented 11 years ago

@micred I noticed that for an empty background, the image would be 7kb... which is too large. Of course you can compress it in this case, but I am not sure about the compression ratio when there are many small pieces.

@iclems's code records the bounding box of each stroke, and merges all overlapping ones. The result is a number of parts occpied in the background, which are not overlapping. The results are pretty good for my samples. In case that there are too many of them, I think I can pack them into one and use CSS sprite.

General PNG compression may be applied after that, they are not exclusive.

Toneti777 commented 11 years ago

My solution is: convert de png files into jpg... and change ref in the html code..

mogrify -format jpg .png mogrify -despeckle -quality 30 -trim .jpg

sed -i 's/.png/.jpg/g' *.page

I reduce size of background image in a 10X rate...

micred commented 11 years ago

Be careful: -quality 30 can lead to poor results on PDFs with scanned text.

iclems commented 11 years ago

Thinking about this, I was wondering why we don't move to a more "Crocodocs"-like approach. Why don't we output the background image we have to SVG thanks to Cairo ? It would really take us one step closer to the best approach possible, and also help reduce a LOT the background size as most of the time it may only be a color, a few vectors, and so on.

Have you already seen this example ? https://github.com/wagle/pdf2svg/blob/master/pdf2svg.c

coolwanglu commented 11 years ago

@iclems See https://github.com/coolwanglu/pdf2htmlEX/issues/116

It has been planned. And actually I had spent a couple of weeks on that. But there's lots of works to do, since most svg are not optimized for web.

And there are other problems, for example text in svg are not selectable in Firefox etc..

coolwanglu commented 11 years ago

Current status

jpg is supported
[png/jpg] one minimal raster image for each page, empty background images are not saved
[Experimental] svg is supported