What is the best way to continuously grow and train a model?

jlevianderson commented 3 years ago

I have retrained an ssd-mobilenet model, was able to convert it to onnx format, and ran it using detectnet. Now I would like to improve my model and add additional categories for detection. Should I just add more images to the JPEGImages/ folder, XML files to the Annotations/ folder, and add the image titles to the train.txt, trainval.txt, test.txt, and val.txt files in the ImageSets/Main folder?

Also, what is the best way to separate my dataset evenly (80%, 20%, 20%) with this particular file structure since all of the XML files are in the Annotations/ folder and all images are in the JPEGImages/ folder? Should I just be spreading the image titles evenly between the train.txt, trainval.txt, test.txt, and val files? I have collected images from a camera & downloaded from the web and then annotated them using labelImg, I did not use the camera-capture tool.

Hopefully my questions make sense. Thank you for your work putting this tutorial together. It's been awesome learning all of this! 👍 For reference I was mainly following the steps outlined in this tutorial: https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-collect-detection.md

dusty-nv commented 3 years ago

I have retrained an ssd-mobilenet model, was able to convert it to onnx format, and ran it using detectnet. Now I would like to improve my model and add additional categories for detection. Should I just add more images to the JPEGImages/ folder, XML files to the Annotations/ folder, and add the image titles to the train.txt, trainval.txt, test.txt, and val.txt files in the ImageSets/Main folder?

If you are adding additional categories, you will want to add your new data to your previous dataset in the way you described (and also add your new categories to the dataset's labels.txt). And do the training again on your new dataset. Since you are changing the number of categories, you would want/need to train it again.

Also, what is the best way to separate my dataset evenly (80%, 20%, 20%) with this particular file structure since all of the XML files are in the Annotations/ folder and all images are in the JPEGImages/ folder? Should I just be spreading the image titles evenly between the train.txt, trainval.txt, test.txt, and val files?

Yes, all the images go into JPEGImages/ and are split between the text files in ImageSets. The ratio is kind of variable, but you can do 80/10/10 or similar. Sometimes I just re-use the training set as well - obviously that is a no-no for validating "real" production models, but it does get you the most training data to use.

jlevianderson commented 3 years ago

I have retrained an ssd-mobilenet model, was able to convert it to onnx format, and ran it using detectnet. Now I would like to improve my model and add additional categories for detection. Should I just add more images to the JPEGImages/ folder, XML files to the Annotations/ folder, and add the image titles to the train.txt, trainval.txt, test.txt, and val.txt files in the ImageSets/Main folder?

If you are adding additional categories, you will want to add your new data to your previous dataset in the way you described (and also add your new categories to the dataset's labels.txt). And do the training again on your new dataset. Since you are changing the number of categories, you would want/need to train it again.

Also, what is the best way to separate my dataset evenly (80%, 20%, 20%) with this particular file structure since all of the XML files are in the Annotations/ folder and all images are in the JPEGImages/ folder? Should I just be spreading the image titles evenly between the train.txt, trainval.txt, test.txt, and val files?

Yes, all the images go into JPEGImages/ and are split between the text files in ImageSets. The ratio is kind of variable, but you can do 80/10/10 or similar. Sometimes I just re-use the training set as well - obviously that is a no-no for validating "real" production models, but it does get you the most training data to use.

Thank you so much! I really appreciate those answers and support. :)

dusty-nv / jetson-inference

What is the best way to continuously grow and train a model? #825