Closed coolwanglu closed 11 years ago
First optimization is great! But I'm not sure splitting is always the best, I agree with your post https://github.com/coolwanglu/pdf2htmlEX/issues/104#issuecomment-15772739 , since in most case is faster to download a slightly bigger file, than a pletora of small files.
WEBP offers better results but it's only supported by chrome at the moment.
PNG-8 with palette optimization and interlacing disabled could provide excellent results on files with simple drawings (like papers) but it's poor on images (since it only support 256 colors). An automatic optimization could be done with optipng (http://optipng.sourceforge.net/pngtech/optipng.html).
@micred I noticed that for an empty background, the image would be 7kb... which is too large. Of course you can compress it in this case, but I am not sure about the compression ratio when there are many small pieces.
@iclems's code records the bounding box of each stroke, and merges all overlapping ones. The result is a number of parts occpied in the background, which are not overlapping. The results are pretty good for my samples. In case that there are too many of them, I think I can pack them into one and use CSS sprite.
General PNG compression may be applied after that, they are not exclusive.
My solution is: convert de png files into jpg... and change ref in the html code..
mogrify -format jpg .png mogrify -despeckle -quality 30 -trim .jpg
sed -i 's/.png/.jpg/g' *.page
I reduce size of background image in a 10X rate...
Be careful: -quality 30 can lead to poor results on PDFs with scanned text.
Thinking about this, I was wondering why we don't move to a more "Crocodocs"-like approach. Why don't we output the background image we have to SVG thanks to Cairo ? It would really take us one step closer to the best approach possible, and also help reduce a LOT the background size as most of the time it may only be a color, a few vectors, and so on.
Have you already seen this example ? https://github.com/wagle/pdf2svg/blob/master/pdf2svg.c
@iclems See https://github.com/coolwanglu/pdf2htmlEX/issues/116
It has been planned. And actually I had spent a couple of weeks on that. But there's lots of works to do, since most svg are not optimized for web.
And there are other problems, for example text in svg are not selectable in Firefox etc..
Current status