JohannesBuchner / imagehash

A Python Perceptual Image Hashing Module
BSD 2-Clause "Simplified" License
3.28k stars 331 forks source link

How do you do the demos? #125

Closed doverradio closed 3 years ago

doverradio commented 3 years ago

I am very impressed with the examples of the Icon dataset and Art dataset and would like to obtain the python code for those as I have nearly the identical use case. However, I noticed they are .sh files (which I assume to be ran in a linux environment with bash) and inside the run_icons.sh file I wasn't able to figure out where I could locate github-urls.txt or hashes.txt. Is there a python script that performs the examples?

JohannesBuchner commented 3 years ago

Hi @doverradio, thanks for pointing this out. I did not add the file to github at the time and deleted it. I think I now recreated it, but you would need to test it. examples/run_icons.sh clones those repositories and creates the hashes.txt file.

You may need additional steps after cloning, such as:

find -name '*.zip' -or -name '*.jar'|xargs --max-args=1 -rt unzip

to unpack archives, and

find -name '*.svg' |while read path; do convert $path ${path/.svg/.png}; done

for converting svgs into pngs (but this leads also to some duplicates).

For the art dataset, I think I did the following:

seq 0 1100|while read i; do echo "http://parismuseescollections.paris.fr/en/recherche/image-libre/true?page=${i}&limit=100"; done |wget -i -

then in those html pages, I grepped for all urls of the art images (urls.txt), and downloaded the files.

Similar to run_icons.sh, I then used hashimage.py to make hashes for each file and kept the url with it (this is what the urlhashes.txt files hold). examples/run_art.sh then just makes the web page.