OlafenwaMoses / ImageAI

A python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities
https://www.genxr.co/#products
MIT License
8.58k stars 2.19k forks source link

How to use open_image_v4 datasets #366

Open DeveloperRachit opened 5 years ago

DeveloperRachit commented 5 years ago

how could i train open_images_v4 with imageai module is there any options to use it

OlafenwaMoses commented 5 years ago

@DeveloperRachit Yes you can. All you need to do is write a program that will convert the annotation data to Pascal VOC Format , as detailed in the documentation linked below.

https://imageai.readthedocs.io/en/latest/customdetection/index.html

rola93 commented 5 years ago

checkout this project: https://github.com/AtriSaxena/OIDv4_to_VOC

DeveloperRachit commented 4 years ago

i converted all my dataset in VOC format now what i have to do next actually i am following all these step what you have sent to me but i's not working for me i have all my XML files with my object categories please suggest me how to do it> Open_images_v4 (datasets) what i have

DeveloperRachit commented 4 years ago

raise ValueError("empty range for randrange()") ValueError: empty range for randrange() that error i'm getting when training my dataset

rola93 commented 4 years ago

Can you share your code and your file structure with us?

I just share this library which looks good but I didn't try it actually, just to be clear

DeveloperRachit commented 4 years ago

actually i have OID -> Datasets>train,validation,test Train--> objectaname(folder)--> images and annotaions also with their objects folder and inside xml file same as validation

rola93 commented 4 years ago

This structure does not follow the required:

train    >> images       >> img_1.jpg  (shows Object_1)
                >> images       >> img_2.jpg  (shows Object_2)
                >> images       >> img_3.jpg  (shows Object_1, Object_3 and Object_n)
                >> annotations  >> img_1.xml  (describes Object_1)
                >> annotations  >> img_2.xml  (describes Object_2)
                >> annotations  >> img_3.xml  (describes Object_1, Object_3 and Object_n)

    >> validation   >> images       >> img_151.jpg (shows Object_1, Object_3 and Object_n)
                    >> images       >> img_152.jpg (shows Object_2)
                    >> images       >> img_153.jpg (shows Object_1)
                    >> annotations  >> img_151.xml (describes Object_1, Object_3 and Object_n)
                    >> annotations  >> img_152.xml (describes Object_2)
                    >> annotations  >> img_153.xml (describes Object_1)

Notice that this doesn't consider test and only does for train and evaluation folders. In addition to this, under train and validation folders, you need only two sub folders: one for annotations and the other for images

DeveloperRachit commented 4 years ago

okay i did but how would i get all labels images

rola93 commented 4 years ago

just to make sure we're on the same page, are you triyng to train a custom object detector, rigth?

I didn't get your last quiestion, can you exaplain it a little more?

DeveloperRachit commented 4 years ago

yes i am going to train Open images datsets and followed by these reference which your team had given like https://imageai.readthedocs.io/en/latest/customdetection/index.html

https://github.com/AtriSaxena/OIDv4_to_VOC https://github.com/EscVM/OIDv4_ToolKit

from last link i downloaded data as this structure and using 2 nd link i converted it xml then iam using 1st link for custom training

rola93 commented 4 years ago

actually i have OID -> Datasets>train,validation,test Train--> objectaname(folder)--> images and annotaions also with their objects folder and inside xml file

This doesn't follow what docs says

DeveloperRachit commented 4 years ago

so now what i have to do how could i train my datasets OID

rola93 commented 4 years ago

You need to store your files in the required order and then launch your training just as any other dataset

DeveloperRachit commented 4 years ago

do you have any example to train OID datsets

DeveloperRachit commented 4 years ago

checkout this project: https://github.com/AtriSaxena/OIDv4_to_VOC you send to me this to train OID datssets

rola93 commented 4 years ago

this project is just to convert from OIDv4 format to VOC (that is the required by ImageAI). I don't know how it works. It looks ok, but I don't know if its output is exactly what you need, in terms of file/folders location. I'm almost sure that you'll need to sort them somehow to get what you need beforer starting your training

DeveloperRachit commented 4 years ago

i need images and annotation and train validations and test folder inside datsets

DeveloperRachit commented 4 years ago

that i have

DeveloperRachit commented 4 years ago

calhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node replica_0/model_1/leaky_0/LeakyRelu}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[{{node training/Adam/gradients/replica_0/model_1/bnorm_99/FusedBatchNorm_grad/FusedBatchNormGrad}}]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

DeveloperRachit commented 4 years ago

now i am getting this error

rola93 commented 4 years ago

what batch_size are you using? I'm pretty sure it's too big, try decreasing it

DeveloperRachit commented 4 years ago

i am using 8 currently

DeveloperRachit commented 4 years ago

i have GPU also with 4 GB

rola93 commented 4 years ago

8 sounds like a lot. Of course it depends on your GPU. Try to decrease it untill it works. I usually set it to 3 or 4.

Make sure to set the gpu number correctly (it is 2 by default, will change to 1 in next versions).

DeveloperRachit commented 4 years ago

I did it 4 already but give me same errors

DeveloperRachit commented 4 years ago

my 1 Epoch taking 3187 sec (3 minutes ) to train only for one object. why does it take too much time in single Epoch?

DeveloperRachit commented 4 years ago

Screenshot from 2019-10-11 11-22-27 this is my time taken by single Epoch SnapShot

OlafenwaMoses commented 4 years ago

Screenshot from 2019-10-11 11-22-27 this is my time taken by single Epoch SnapShot

@DeveloperRachit The training time for each epoch depends on the number of images you have in your dataset, your batch size (which also will be dictated by your GPU capacity)

rola93 commented 4 years ago

One point you may try is increassing the queue size, in this line:

train_model.fit_generator(
            generator=train_generator,
            steps_per_epoch=len(train_generator) * self.__train_times,
            validation_data=valid_generator,
            validation_steps=len(valid_generator) * self.__train_times,
            epochs=self.__train_epochs + self.__train_warmup_epochs,
            verbose=1,
            callbacks=callbacks,
            workers=4,
            max_queue_size=8
        )

max_queue_size specifies how many batches it’s going to prepare in the queue. Consider than training and batch generation run in parallel: training took part mostly on the GPU, while the batch generation is made on CPU. CPU is usually lower than GPU, in addition to this, the images are read from disk while generating the samples. So maybe your GPU goes fast but it has to wait until a batch is ready for training.

All that said, try to increase max_queue_size, specially if you are not using an SSD. It may consume some more memory, but it may worth.

It'd be nice to avoid the file reading while generating the samples, at least as an optional feaure @OlafenwaMoses I'm able to work on it, if you agree.

further reading: Keras fit_generator docs How to parallelise Detailed explanation of fit_generator params

DeveloperRachit commented 4 years ago

hi dear, i have 8 gb Ram system and 1TB hard disk with i GPU GTX1050 with 4GB Ram is it best for training 1 for 1 Million Open_images_dataset