Avoid duplicate images - Githubissues

After running pdf2htmlEX --embed cfijo --split-pages 1 [filename], I noticed there was quite a few duplicate images. Running fdupes . shows:

./bg17.png                              
./bg16.png

./bg2.png
./bg4.png
./bg5.png
./bg6.png
./bg7.png
./bg12.png
./bg13.png
./bg14.png
./bg15.png
./bg1c.png
./bg1f.png
./bg21.png
./bg20.png
./bg22.png
./bg24.png
./bg23.png
./bg25.png
./bg28.png
./bg2a.png

At the moment I'm using a bash script to create soft links for duplicate files, but it would be nice if pdf2htmlEX would do that automatically. It saves a bit of space and bandwidth, and I don't think it's too different to implement. (If you're wondering why I didn't use hard links, it's because Git does not handle them.)

#!/bin/bash
fdupes -r -1 -n "$@" | sed -e 's/\(\w\) /\1|/g' -e 's/|$//' > files.dup.list.txt
while read line; do
        IFS='|' read -a arr <<< "$line"
        orig=${arr[0]}
        for ((i = 1; i < ${#arr[@]}; i++)); do
                file="${arr[$i]}"
                ln -sf "$orig" "$file"
        done 
done < files.dup.list.txt
rm files.dup.list.txt

coolwanglu / pdf2htmlEX

Avoid duplicate images #569