Open Manouchehri opened 9 years ago
Duplicated background images should be not very common -- though it is a waste when it does occur.
I suggest you try --bg-format svg --svg-embed-bitmap 0
and see if there are still duplicated images.
It should be not hard to de-duplicate background images in pdf2htmlEx itself. It may compute checksum of each generated background images and check if there is a duplicated one. Would you like to implement this feature?
After running
pdf2htmlEX --embed cfijo --split-pages 1 [filename]
, I noticed there was quite a few duplicate images. Runningfdupes .
shows:At the moment I'm using a bash script to create soft links for duplicate files, but it would be nice if pdf2htmlEX would do that automatically. It saves a bit of space and bandwidth, and I don't think it's too different to implement. (If you're wondering why I didn't use hard links, it's because Git does not handle them.)