Open WuXinyang2012 opened 6 years ago
You can try this code written in Python 3, just change CLASS_LIST for a list with the names of the classes that you want to download. You can check the names here: https://github.com/tensorflow/models/blob/master/research/object_detection/data/oid_bbox_trainable_label_map.pbtxt
import csv
import os
CLASS_LIST = ('/m/01g317', '/m/09j2d')
with open('open-images/validation/annotations-human-bbox.csv', newline='') as csvfile:
bboxs = csv.reader(csvfile, delimiter=',', quotechar='|')
for bbox in bboxs:
if bbox[2] in CLASS_LIST:
os.system("gsutil cp gs://open-images-dataset/validation/%s.jpg PATH-TO-SAVE)
@tomasriv the programe works with some mistake in code: os.system("gsutil cp gs://open-images-dataset/validation/%s.jpg PATH-TO-SAVE)--> os.system("gsutil cp gs://open-images-dataset/validation/%s.jpg PATH-TO-SAVE"%bbox[0])
it works well, thanks for share~
Sorry for that, I think I delete it when I was writing it here. Glad it worked!
@tomasriv hi, rly thanks a lot! your comments help me!
For the guy who need many classes, you need to notice that this script may download and overwrite one same image multiple times since this image may contain multiple target classes. In the csv file, for example, one image with 8 objects and bounding boxes will continuously occupy 8 rows, and then if all the 8 classes are your wanted classes, then this image will be downloaded 8 times. So you need to make a little modification to avoid it.
import csv
import os, sys
dir = sys.path[0]
tmp_dir = dir + "/food/"
tmp = []
CLASS_LIST = ("/m/02wbm", "/m/01_bhs","/m/01b9xk","/m/02y6n","/m/01dwsz", "/m/01dwwc","/m/01j3zr","/m/01ww8y","/m/01f91_",
"/m/01hrv5", "/m/021mn","/m/0270h", "/m/01tcjp","/m/021mn","/m/0cxn2","/m/0fszt","/m/0gm28",
"/m/02g30s", "/m/02xwb","/m/014j1m","/m/0388q", "/m/043nyj","/m/061_f", "/m/07fbm7", "/m/07j87",
"/m/09k_b", "/m/09qck", "/m/0cyhj_","/m/0dj6p","/m/0fldg", "/m/0fp6w","/m/0hqkz","/m/0jwn_","/m/0kpqd",
"/m/0kpt_", "/m/033cnk","/m/052lwg6", "/m/01f91_","/m/01fb_0","/m/01tcjp", "/m/021mn","/m/09728",
"/m/0hnyx","/m/0jy4k", "/m/015wgc","/m/02zvsm","/m/052sf", "/m/05z55","/m/0663v", "/m/06nwz","/m/09gys",
"/m/0fbdv","/m/0_cp5", "/m/0cjq5","/m/0ll1f78", "/m/0n28_","/m/07crc", "/m/07mcwg","/m/0f4s2w","/m/015x4r",
"/m/015x5n","/m/047v4b", "/m/05vtc","/m/07j87","/m/0cjs7","/m/0dv77","/m/05zsy","/m/027pcv", "/m/0fbw6","/m/0fj52s",
"/m/0grw1","/m/0hkxq", "/m/0jg57","/m/02cvgx","/m/0fz0h", "/m/0l515","/m/0cdn1", "/m/06pcq","/m/0284d", "/m/01nkt",
"/m/04zpv","/m/07030" )
with open(dir + '/validation-annotations-bbox.csv', newline='') as csvfile:
bboxs = csv.reader(csvfile, delimiter=',', quotechar='|')
for bbox in bboxs:
if bbox[0] == tmp:
continue #Avoid downloading one image many times for the image which contains multiple target classes
if bbox[2] in CLASS_LIST:
tmp = bbox[0]
os.system("gsutil cp gs://open-images-dataset/validation/%s.jpg %s"%(bbox[0], tmp_dir))
Here is the code for s3 bucket:
import csv
import boto3
from botocore import UNSIGNED
from botocore.config import Config
BUCKET_NAME = 'open-images-dataset'
s3 = boto3.resource('s3', config=Config(signature_version=UNSIGNED))
CLASS_LIST = ['/m/01prls']
with open('train-annotations-bbox.csv', "r") as csvfile:
bboxs = csv.reader(csvfile, delimiter=',', quotechar='|')
for bbox in bboxs:
if bbox[2] in CLASS_LIST:
key = 'train/'+bbox[0] + '.jpg'
destination = PATH + bbox[0] + '.jpg'
s3.Bucket(BUCKET_NAME).download_file(key, destination)
@WuXinyang2012 Hi, maybe this can be helpful https://github.com/EscVM/OIDv4_ToolKit
Hi I only want to use the images of fruit and vegetable categories, I dont need a huge full dataset, can you please give me some instructions?