cwerner / fastclass

Little tools to download and then weed through images, delete and classify them into groups for building deep learning image datasets (based on crawler and tkinter)
Apache License 2.0
133 stars 25 forks source link

Resizing results from multiple searchengines (-c ALL) overwrites images #1

Closed olivermaerz closed 5 years ago

olivermaerz commented 5 years ago

https://github.com/cwerner/fastclass/blob/4e418fa9aff2544b01052d005569b3b4912ca641/fc_download.py#L117

It looks like resize overwrites the output files when multiple crawlers are used. For example when resizing it goes throw the google results first and resizes 000001.jpg from the google results to the output folder. Then it resizes the the Bing results 00001.jpg and saves it to the sames folder overwriting the image from Google. And finally resizes the image 00001.jpg from Baidu and also saves it to the output folder overwriting the image from big.

So: tmp/searchterm.google/000001.jpeg -> dataset/searchterm/000001.jpg tmp/searchterm.bing/000001.jpeg -> dataset/searchterm/000001.jpg tmp/searchterm.baidu/000001.jpeg -> dataset/searchterm/000001.jpg

Leaving only the image from the Baidu search in the output folder.

cwerner commented 5 years ago

This should be fixed now. In the current version all crawlers save into the same folder and increment the file count.