cwerner / fastclass

Little tools to download and then weed through images, delete and classify them into groups for building deep learning image datasets (based on crawler and tkinter)
Apache License 2.0
133 stars 25 forks source link

TypeError: 'NoneType' object is not iterable #27

Closed THuffam closed 4 years ago

THuffam commented 4 years ago

Hi Just installed FastClass as per instructions on your blog post

Created a simple query file and ran the command as per your blog post but got the following error:

fcd -c GOOGLE -k -o surfers surfers.csv
INFO: final dataset will be located in surfers
[1/2] Searching: >> surfer aerial view <<
(1) Crawling ...
    -> GOOGLE
Number of duplicate image files: 0. Removing...
(2) Resizing images to (299, 299)
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/home/tim/miniconda3/envs/ml/bin/fcd", line 8, in <module>
    sys.exit(cli())
  File "/home/tim/miniconda3/envs/ml/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/tim/miniconda3/envs/ml/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/tim/miniconda3/envs/ml/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/tim/miniconda3/envs/ml/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/tim/miniconda3/envs/ml/lib/python3.6/site-packages/fastclass/fc_download.py", line 170, in cli
    main(infile, size, crawler, keep, maxnum, outpath)
  File "/home/tim/miniconda3/envs/ml/lib/python3.6/site-packages/fastclass/fc_download.py", line 138, in main
    for item in source_urls:
TypeError: 'NoneType' object is not iterable

I'm running Ubuntu 19.10 using conda environment with Python 3.6. I also tried a new install of FastClass in a new conda environment with python 3.7 and got the following error:

fcd -c ALL -k -o surfers surfers.csv
INFO: final dataset will be located in surfers
[1/2] Searching: >> surfer aerial view <<
(1) Crawling ...
    -> GOOGLE
    -> BING
Number of duplicate image files: 1. Removing...
(2) Resizing images to (299, 299)
100%|█████████████████████████████████████████| 521/521 [00:10<00:00, 52.08it/s]
[2/2] Searching: >>  <<
(1) Crawling ...
    -> GOOGLE
    -> BING
Number of duplicate image files: 1. Removing...
(2) Resizing images to (299, 299)
  0%|                                                     | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/tim/miniconda3/envs/py3.7/bin/fcd", line 8, in <module>
    sys.exit(cli())
  File "/home/tim/miniconda3/envs/py3.7/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/tim/miniconda3/envs/py3.7/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/tim/miniconda3/envs/py3.7/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/tim/miniconda3/envs/py3.7/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/tim/miniconda3/envs/py3.7/lib/python3.7/site-packages/fastclass/fc_download.py", line 170, in cli
    main(infile, size, crawler, keep, maxnum, outpath)
  File "/home/tim/miniconda3/envs/py3.7/lib/python3.7/site-packages/fastclass/fc_download.py", line 133, in main
    source_urls = resize(files, outpath=out_resized, size=SIZE, urls=source_urls)
  File "/home/tim/miniconda3/envs/py3.7/lib/python3.7/site-packages/fastclass/imageprocessing.py", line 32, in resize
    im = Image.open(f)
  File "/home/tim/miniconda3/envs/py3.7/lib/python3.7/site-packages/PIL/Image.py", line 2843, in open
    fp = builtins.open(filename, "rb")
IsADirectoryError: [Errno 21] Is a directory: '/tmp/tmpct1gtla5/surfer'

However it did run fine when run from a conda environment with python 3.7 on my windows 10 box.

Any suggestions? Thanks, and thanks so much for developing this app! Kind regards Tim

cwerner commented 4 years ago

Hm, strange... could you post or provide the csv file you use?

THuffam commented 4 years ago

yep - it's just a simple one line (class) to test:

searchterm,exclude
surfer aerial view,aerial view

Same file worked on my windows pc. Note I created the file separately on each machine (not copied the file across from one to the other - so no wierd character differences).

I've just checked and for the last run (the last error above) it did seem to work, in that it created the output folder and appears to have created the resized images - but it looks like some images are missing and it also did not created the .raw output folder.

cwerner commented 4 years ago

I'm not sure, but I have the suspicion that the underlying icrawler has issues with the current state of the google API. There is a closed issue https://github.com/hellock/icrawler/pull/68, but no new version 0.6.3 yet as far as I can see...

cwerner commented 4 years ago

I guess I really should to get going and finally implement the test suite I planned adding... 🤷‍♂️

I'll try to trace what's going on in the next days... Sorry for that

cwerner commented 4 years ago

@THuffam ,

would you mind testing only non-Google crawl options for the moment?

fcd -c BING -k -o surfers surfers.csv

This worked for me just now... I'll try to figure out what's going on with GOOGLE (which is also included in ALL) in the meantime...

THuffam commented 4 years ago

Hey thanks so much for your help with this. I have just kicked off the command you asked for. I'm quite happy to just use the output from the windows machine - but thought I should raise the issue. That said I'm very impressed with your software and can see me using it more, so it would be great to get it working on ubuntu. I wonder if it's something to do with my enviroment - its a newly installed OS (just created a dual boot 2 days ago) - so happy to make any change if you need me to. I'm based in Australia - so out time differences will be a bit out of whack. I'll post the results tomorrow morning. Thanks again Cheers Tim

DrPav commented 4 years ago

I am having the same issue when using Google. Bing is fine icrawler.builtin.GoogleImageCrawler is not returning any results. Google must have changed the format of its image search

THuffam commented 4 years ago

aha.. The issue of IsADirectoryError: [Errno 21] Is a directory: '/tmp/tmp2fv9jud7/surfer is caused by an empty line in the query file - when run on linux (Ubuntu v19.10) - whereas windows and bash on windows can handle it. I noticed that your example file (guitars.csv) also has a blank line.

I noticed in the output that it looks like it is trying to do 2 searches even though there is just one query in the file...see Searching: >> << in the output above.

Also, while testing all of this, I rebooted and created 2 new conda environments one with python 3.7.6 and the other 3.6.10 - both worked using -c BING. So the original error did not occur either.

Hope this helps Let me know if there is anything else I can do to help. Cheers Tim

luiznonenmacher commented 4 years ago

Hi! I'm having the same issue of 'NoneType' object is not iterable when searching in Google, but it works fine on BAIDU or BING.

I'm using python 3.8.2 on a Windows 10 machine and I'm tested with my own csv and the guitars.csv.

cwerner commented 4 years ago

Hi 👋

Yes, that’s unfortunately currently a bug with the underlying icrawler package and the changes in googles API. There is an unreleased patch discussed in icrawler that am investigating at the moment.

I’m a bit swamped with real live work atm but try to look into how we could fix this...

cwerner commented 4 years ago

Hi all :wave:

I just added a GoogleCrawler hotfix. Could anyone that has issues with fcd using the GOOGLE option try again and see if the problem is resolved?

Cheers

THuffam commented 4 years ago

Sure .. just tried to update it - but not sure of the command to use - have tried pip install git+https://github.com/cwerner/fastclass.git#egg=fastclass

But when I ran fcd it gave the same error (TypeError: 'NoneType' object is not iterable)

I'm assuming it has not updated to use your new code. What command should I use to update instead? Thanks tim

cwerner commented 4 years ago

Hi @thuffam 👋

Could you try to add --upgrade to the pip command?

THuffam commented 4 years ago

Works great - thanks for the update! One difference though... I couldn't find any of the original images (which I wanted to keep) - where these deleted or are they now located somewhere else?

Thanks again

cwerner commented 4 years ago

@THuffam, great to hear!

Hm, I swapped the path handling from os.path to pathlib... However, this shouldn't alter the locations... Can you explain a bit more what you are doing exactly? and maybe start in a clean folder?

THuffam commented 4 years ago

Where does pathlib point to? I use the following command: fcd -c GOOGLE -s 224 surfers.csv The file surfers.csv contains these 2 lines:

searchterm,exclude
surfer aerial view, aerial view

I have just created another folder to retest this and copied my .csv file into it an re-ran the command. It produced the same results... it created a folder called dataset and within this a file called surfer.log and a folder called surfer which contains all the resized images. But I cannot see any of the original images anywhere (not to say they are not somewhere else on my drive).

cwerner commented 4 years ago

@THuffam Thanks for checking 👍.

Seems I did introduce a bug in the process. Really should add some proper tests... I'm at work atm and need to finish some stuff but will have a look at it tonight the latest. Should be an easy fix...

cwerner commented 4 years ago

Ahm @thuffam, did you by any chance forget to add the --keep / -k flag? 😉

See fcd --help?

fcd -k -c GOOGLE -s 224 surfers.csv

This should create a datasets.raw folder with the original images...

THuffam commented 4 years ago

Oh - yes - fail. Sorry about that. Tried again with the following command: fcd -k -c GOOGLE -s 224 surfers.csv And this time it did keep the original files (in the dataset.raw folder).

Also tested it with the -c ALL option and it worked - although that only used Google and Bing - not Baidu which I've seen in other examples.

Thanks

cwerner commented 4 years ago

Ok. Great. Closing this now 👍