Closed doverradio closed 3 years ago
Hi @doverradio, thanks for pointing this out. I did not add the file to github at the time and deleted it. I think I now recreated it, but you would need to test it. examples/run_icons.sh clones those repositories and creates the hashes.txt file.
You may need additional steps after cloning, such as:
find -name '*.zip' -or -name '*.jar'|xargs --max-args=1 -rt unzip
to unpack archives, and
find -name '*.svg' |while read path; do convert $path ${path/.svg/.png}; done
for converting svgs into pngs (but this leads also to some duplicates).
For the art dataset, I think I did the following:
seq 0 1100|while read i; do echo "http://parismuseescollections.paris.fr/en/recherche/image-libre/true?page=${i}&limit=100"; done |wget -i -
then in those html pages, I grepped for all urls of the art images (urls.txt), and downloaded the files.
Similar to run_icons.sh, I then used hashimage.py to make hashes for each file and kept the url with it (this is what the urlhashes.txt files hold). examples/run_art.sh then just makes the web page.
I am very impressed with the examples of the Icon dataset and Art dataset and would like to obtain the python code for those as I have nearly the identical use case. However, I noticed they are .sh files (which I assume to be ran in a linux environment with bash) and inside the run_icons.sh file I wasn't able to figure out where I could locate github-urls.txt or hashes.txt. Is there a python script that performs the examples?