EscVM / OIDv4_ToolKit

Download and visualize single or multiple classes from the huge Open Images v4 dataset
GNU General Public License v3.0
809 stars 635 forks source link

Missing n_threads argument causing a TypeError from download() function call #31

Closed monocongo closed 5 years ago

monocongo commented 5 years ago

I have attempted to use this software for downloading a certain group of image classes ("Weapon").

I have used the following command:

python main.py --Dataset ~/data/openimages --classes Weapon --type_csv 'all' downloader

Once this starts working I was prompted to save missing files and then many messages indicating a missing aws command:

    [INFO] | Downloading Weapon.
   [ERROR] | Missing the class-descriptions-boxable.csv file.
[DOWNLOAD] | Do you want to download the missing file? [Y/n] Y
...145%, 0 MB, 1653 KB/s, 0 seconds passed
[DOWNLOAD] | File class-descriptions-boxable.csv downloaded into OID/csv_folder/class-descriptions-boxable.csv.
   [ERROR] | Missing the train-annotations-bbox.csv file.
[DOWNLOAD] | Do you want to download the missing file? [Y/n] Y
...100%, 1138 MB, 10685 KB/s, 109 seconds passed
[DOWNLOAD] | File train-annotations-bbox.csv downloaded into OID/csv_folder/train-annotations-bbox.csv.

-----------------------------------------------Weapon-----------------------------------------------
    [INFO] | Downloading all images.
    [INFO] | [INFO] Found 1646 online images for train.
    [INFO] | Download of 1646 images in train.
sh: 1: aws: not found
sh: 1: aws: not found
sh: 1: aws: not found
sh: 1: aws: not found
...

Finally I am seeing the following error:

    [INFO] | Done!
    [INFO] | Creating labels for Weapon of test.
    [INFO] | Labels creation completed.
Traceback (most recent call last):
  File "main.py", line 36, in <module>
    bounding_boxes_images(args, DEFAULT_OID_DIR)
  File "/home/james/git/OIDv4_ToolKit/modules/bounding_boxes.py", line 89, in bounding_boxes_images
    download(args, df_val, folder[i], dataset_dir, class_name, class_code, threads = int(args.n_threads))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

Perhaps this is being caused by the n_threads argument not having a reasonable default argument? The help information shows that this value is 20 by default, maybe this should be verified?

In any event thanks for making this code available. Once I get it to work it seems that it will save me lots of time for collecting a sub-dataset from OpenImages.

monocongo commented 5 years ago

I think I worked my way around the above-mentioned errors using the following command:

python main.py --Dataset ~/data/openimages --classes Handgun Shotgun Rifle Knife Dagger Axe --type_csv 'all' --n_threads 20 downloader

Notice I have used the actual class names ("Handgun", "Knife", etc.) rather than the parent category ("Weapon") and I have included a n_threads argument.

BTW I also needed to install awscli, opencv, tqdm, and pandas. I'm not sure why this was required since I did the installation of dependencies from requirements.txt as instructed in the README, but I am using an Anaconda environment so perhaps that has something to do with it.

keldrom commented 5 years ago

You cannot use the parent class but only the lower ones. Btw yes the problem you had with the installation is due to anaconda environment.

monocongo commented 5 years ago

The n_threads problem mentioned in the original comment is still an issue, no? It seems that the problem lies within modules/bounding_boxes.py, in that lines 88-89 are mis-indented, and these two lines need to be indented one more level to align with the corresponding if statement.

For example change this

                        df_val = TTV(csv_dir, name_file)
                        if not args.n_threads:
                            download(args, df_val, folder[i], dataset_dir, class_name, class_code)
                    else:
                        download(args, df_val, folder[i], dataset_dir, class_name, class_code, threads = int(args.n_threads))

to this


                        df_val = TTV(csv_dir, name_file)
                        if not args.n_threads:
                            download(args, df_val, folder[i], dataset_dir, class_name, class_code)
                        else:
                            download(args, df_val, folder[i], dataset_dir, class_name, class_code, threads = int(args.n_threads))```