AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.75k stars 7.96k forks source link

How to handle an increasing number of classes? #8744

Open Andersama opened 1 year ago

Andersama commented 1 year ago

I have a family member who recently asked if I could possibly write a program to recognize a number of animals in an image. There's a good chance (if I complete the program) they're going to want to recognize a new set of animals entirely, or even potentially have a more specific classifier. I wouldn't want them to run into an issue where they're spending a lot of time retraining the network for a few new things. But since I think this'd be a fun project anyway I'm in part writing this for myself.

After reading the source code a bit I think I might run into a few problems in the future so I have a couple questions. 1) Would it be possible to "expand" the ai, eg; take an existing neural network, copy it's weights and add the necessary neurons / nodes for "new" nodes? Similarly, if the neural network can be expanded can only the "new" portion be trained against labelled data without impacting the results of the other labels? 2) Is there a label format that you can process for training an ai that isn't numerically indexed?

Here are the issues I see If I understood how the api currently works for training data. If say my family member or myself wants to train a network say to recognize three animals and so we create a label file that looks like:

cat
dog
bird

Then later that turns out it wasn't enough, they need to for some reason detect more animals or be more specific like:

cat
dog_terrier
dog_husky
bird
hamster
turtle

But* they've already done an insane amount of work labelling data as is eg: they've labelled all the cats and dogs they could possibly ever want. It seems like currently the existing way to process the training data would force them to edit all the previously existing files and update their indexes. This would be unavoidable in this example for the dog label being split into two, but for example the bird label has been just been incremented.

It seems like the potential solution to this is to create a mapping between indexes (and also to save/store the existing labels inside the ai's "config" file) and then to have a program which can process a different label format and map the indexs to their correct places.

cat (0) -> cat (0)
dog (1) -> ? // there's a potential option to just keep the dog label around and just to ignore it 
bird (2) -> bird (3)
...etc...

Currently for example I have my gui write a label file like this: <center_x> <center_y> <width> <height> <label> where label is a newline terminated string so that any text following the last parameter can just be assumed to be the label. Which I store not with a .txt extension but append _lbl to the existing image. I'm doing this so that I could write what darknet expects in a .txt file.

This is what to me puts me off darknet at the moment, so far as I can tell, I may have to write a program which rewrites thousands of .txt files just to potentially train a new network (only to load those later again). Obviously this feels off, especially since at the point my gui is doing all this work all of this remapping data must already be in program memory so it seems like it should be free to just start training a new network.

Andersama commented 1 year ago

Going through the source code for training the network I'm suspecting with some edits to the source code this may be possible.

void *load_thread(void *ptr); // calls load_data_region when REGION_DATA is used in load_args

data load_data_region(int n, char **paths, int m, int w, int h, int size, int classes, float jitter, float hue, float saturation, float exposure); // calls fill_truth_region

void fill_truth_region(char *path, float *truth, char** labels, int classes, int num_boxes, int flip, float dx, float dy, float sx, float sy); // loads the images into their respective image struct with some modifications, and reads the associated .txt file

It seems like a variation of load_data_region and fill_truth_region which preallocates an array of char's for labels and then fills them as files are read could automatically label files with unique id's. This would create a slight dependency that id's be loaded in order. However we could have a single pass over the label files themselves, potentially enforcing alphabetical order to the labels etc...or could have an already existing array loaded in beforehand.

Since the labels are likely to exist before the ai is trained we could store them in the .cfg file, parsing them back out for later when training. We'd be able to edit them freely as before, but wouldn't have to go back to build a massive database of labelled images.