HamedMP / ImageFlow

A simple wrapper of TensorFlow for Converting, Importing (and Soon, Training) Images in tensorflow.
Apache License 2.0
162 stars 59 forks source link

Create dataset from scratch #1

Open kYroL01 opened 8 years ago

kYroL01 commented 8 years ago

First of all, thank you @HamedMP for the huge work. In my opinion, to extend ImageFlow functionalities, a nice feature to add is giving the possibility to create a new dataset from scratch. It could be very usefull to anyone who needs to create his personal cnn for image recognition not base on pre-built dataset (i.e. Cifar, MNIST or Imagenet). What do you think ?

HamedMP commented 8 years ago

Thank you @kYroL01

Actually it's the reason why I created ImageFlow which is to make you enable to work with your very own data. It was the problem in my case and I built this solution (not fully complete as I wanted) and decided to publish it.

For example we have used this library for a Car Detector problem which was fully personal dataset and problem.

I would be happy if you explain more what it lacks which make it difficult for your dataset?

kYroL01 commented 8 years ago

Thanks for the fast reply. Unfortunately I'm not discover ImageFlow in deep yet, so, sorry if I say wrong things. I try to explain me better: I'm creating a personal cnn to recognize three categories of images and I have to create my own dataset. I follow tensorflow tutorial for the creation of the model and it was usefull to understand how to create a cnn based on a already-exist model. The problem is: I don't find any reasonable and illustrative tutorial on how to create a personal dataset, so, looking on github I found your nice project. But as I read here you convert a directory that contains images AND LABELS, but I have no labels in the beginning, just images in three subfolders (my categories). Then is very usefull using your library to convert images and labels to data tensor to pass to the tensorflow model.

I also read this but not the 100 % of what I need.

Suggestion ? I hope I was clear.

Thanks

HamedMP commented 8 years ago

Your categories are your LABELS. As CNN is supervised learning, you should provide it with labels to make it able to learn its errors and improve it.

So if you have 3 categories, you should give the labels e.g 1, 2, 3 or A, B, C , ... to them and convert them tfrecords if you want to enjoy queueing features.

kYroL01 commented 8 years ago

Yes of course, I understand that my categories are the labels. My question is: how can I create my dataset to say "this category is my label" ? Then, doing this, I convert images and labels using ImageFlow. The problem is at the very beginning.

HamedMP commented 8 years ago

It can be done in a simple programming level job.

For example you will read images full path and put it into the images_array At the same time you can read the folder they are containing in, let's say CAT1, ... and also append it to another array named labels_array.

You can do this in python by getting the full path of the images, then separate it by '/' and access to the returned_array[-2] element which will be the folder name. '-1' will be the file name.

kYroL01 commented 8 years ago

Ok, I'll try. I think I understand. And, related to my initial question, do you think is a feature you can add in ImageFlow or not ?

Thank you for the suggestion and for the time spent.