Incremental training custom models

acleitao76 commented 5 years ago

Can I use this procedure to increment train a model? https://medium.com/deepquestai/object-detection-training-preparing-your-custom-dataset-6248679f0d1d

like... I do that... get the best result .h5 file and start the tutorial all over with that file instead of the pre trained yolo model?

Meulen92 commented 5 years ago

I think this is possible, but be sure to backup your first models and detection_config.json somewhere, as the new detection_config.json will be incompatible with the older models and vice versa.

acleitao76 commented 5 years ago

Ok I found some nice information. I will leave the link here just in case someone has the same doubt. Yes it's possible to continue the training of a model just have to use continue_from_model parameter. https://github.com/OlafenwaMoses/ImageAI/blob/master/imageai/Prediction/CUSTOMTRAINING.md#continuoustraining

Thx for the help. Really excited about this library

auphofBSF commented 5 years ago

My use case is trying to use Preemptible GPU instances', so being able to resume a training process ,for ObjectDetection Models, in the event of preemption or other failure.

using this trainer from imageai.Detection.Custom import DetectionModelTrainer

So I understand from above we can resuse the best result from (trainer.evaluateModel)

Forgive some newbie questions, i am just beginning to get my head around CNN's and reading up on the Keras API which @OlafenwaMoses and team have so elegently encapsulated in imageAI.

I can see #266 asking on a similar thread, and I see the usefulness of the following PR #261

The questions I am trying to resolve , are around an algorithim for restarting training, and with this clarification possibly add it to the documentation imageai/Detection/Custom/CUSTOMDETECTIONTRAINING.md

Is there any training resume support currently in the imageai.Detection.Custom , DetectionModelTrainer
In the absence of direct support would the following code be a good assumption for resumption of training. I see there is a PR #302 to aid support of getting results from evaluateModel.
Can we use the same training and test images, in success iterations(resumptions) of training, evaluation

# pseudo code for a resumable training 
# Trainer configure
from imageai.Detection.Custom import DetectionModelTrainer
....some setup code here ...... and code is not complete

trainer.setDataDirectory(data_directory="DatasetCustom1")

trainingEpochs = 3
totalTrainingExperiments = 120 
aPersistantStore = 'someMountedPersistantPath'

# First Iteration
state=RetrieveLastKnownState(storage_path=aPersistantStore)

if not stateFirstTimeLearn(): # some function on state
    pretrainedModel='pretrained-yolov3.h5

    trainer.setTrainConfig(object_names_array=objNamesList, batch_size=4, num_experiments=trainingEpochs, train_from_pretrained_model=pretrainedModel)
    trainer.trainModel()

    state = {'EpochCount':trainingEpochs, 'Trained':True, 'Evaluated':False, selectedModel:None}

    results = trainer.evaluateModel(model_path="DatasetCustom1/models", json_path="DatasetCustom1/json/detection_config.json", iou_threshold=0.5, object_threshold=0.3, nms_threshold=0.5)
    selectedEpochName = determineSelectedEpoch_on_mAP()    # a function of results['mAP'] see #302

    state = {'EpochCount':1, 'Trained':True, 'Evaluated':True, 'selectedModel':selectedEpoch_path }
    saveSelectedEpoch(storage_path = aPersistantStore, 
                    model_path="DatasetCustom1/models/", 
                    selected_epoch=selectedEpochName,
                    json_path="DatasetCustom1/json/detection_config.json"
                    state =state )  # Save

while state['EpochCount']< totalTrainingExperiments or stateResumed:
    # Second to N iterations or resumptions
    if stateResumed():  # some function on State
        retrieveDataset(storage_path = aPersistantStore,
                            dataset_path="DatasetCustom1")
        state, pretrainedModel = retrieveSelectedEpochforFurtherTraining(storage_path = aPersistantStore,
                            pretrained_path = "DatasetCustom1/pretrained",
                            )
    elif NextItteration(): # some function of State
        pretrainedModel = moveSelectedModeltoPretrainedPath(pretrained_path = "DatasetCustom1/pretrained", 
                            model_path="DatasetCustom1/models/",selected_epoch=selectedEpochName)

    trainer.setTrainConfig(object_names_array=objNamesList, batch_size=4, num_experiments=trainingEpochs, train_from_pretrained_model=pretrainedModel)
    trainer.trainModel()
    currentTotalEpochCount = state['EpochCount']+
    state = {'EpochCount':currentTotalEpochCount, 'Trained':True, 'Evaluated':False, selectedModel:None}

    results = trainer.evaluateModel(model_path="DatasetCustom1/models", json_path="DatasetCustom1/json/detection_config.json", iou_threshold=0.5, object_threshold=0.3, nms_threshold=0.5)
    selectedEpochName = determineSelectedEpoch_on_mAP(results)    # a function of results['mAP'] see PR #302
    state = {'Iteration':currentTotalEpochCount, 'Trained':True, 'Evaluated':True, 'selectedModel':selectedEpoch_path }
    saveSelectedEpoch(storage_path = aPersistantStore, 
                model_path="DatasetCustom1/models/", 
                selected_epoch=selectedEpochName,
                json_path="DatasetCustom1/json/detection_config.json"
                state =state )  # Save

auphofBSF commented 5 years ago

continue_from_model

https://github.com/OlafenwaMoses/ImageAI/issues/298#issuecomment-522795595 This is correct for training Predicition models, ObjectDetection models "yolo v3" do not have this option. only def setTrainConfig(self, object_names_array, batch_size= 4, num_experiments=100, train_from_pretrained_model=""): https://github.com/OlafenwaMoses/ImageAI/blob/bde933a1a168f89dcbec4e3c4b8154ad6d9794a1/imageai/Detection/Custom/__init__.py#L156.

I share your enthusiasm for this library, IT WORKS GREAT, just trying to make it fit some real life scenarios.

erutanrevol commented 5 years ago

This is correct for training Predicition models, ObjectDetection models "yolo v3" do not have this option.

Hello. Sorry for having more newbie questions, but still: so, is there a way to continue training, with

using this trainer from imageai.Detection.Custom import DetectionModelTrainer ?

If there is a way, should we keep all pictures in train folder, or i can delete old and add some new?

rola93 commented 5 years ago

I need to double check it but I think you can do it easly by providing the path to a partially trained model file (i.e. .h5f file in data_directory/models)(*) when you set the configuration in the following operation:

 def setTrainConfig(self,  object_names_array, batch_size= 4, num_experiments=100, train_from_pretrained_model="dataset_directory/models/some_partially_trained_model.h5"):

I can't try it right now, maybe I try it later, but you can do it your self, just do the following:

1) Train for an object from yolo's weigths files (i.e as suggested on docs) for some experiments (let's say 4) 2) consider the loss that is shown on scren when the taraining finishes, it should be low 3) train again, but starting from a pretrained model file generated on previous step (the last of the model files), as suggested above (*)

If everything goes well, the second training round (step 3) should start with losses values similar to the shown when training was finishing at step 2

OlafenwaMoses commented 5 years ago

Thanks everyone for this conversation. My apologies for commenting late in great discussions like this. This is due to all amount of work involved in the project's ecosystem. I will try to keep up as much as possible. Now let me address the issues raised.

Allowing for continuation of the model training process without issues of incompatible detection_config.json file is an issue that will be resolved soon. Please note that you can continue your training using a previously generated model in a different training, use different/add new train and validation images as well.
Current attempt to continue training from a previously generated model can result is spikes in loss values due to the generation of new anchor values and detection_config.json file but will reduce as the training progress.
For evaluation, we will try to do some work around dealing with the evaluation process to make it more robust, flexible and convenient than it currently is.

nineclicks commented 5 years ago

I made a quick and dirty fork of ImageAI to optionally accept anchors rather than generating new ones. Admittedly I do not know the full ramification of doing this, I assume it would be a bad idea to change the training data without regenerating anchors. However when using this to simply restart where I last left off, I do not lose any loss progress whereas simply transfer training from my last model with new anchors would always require 4 to 6 epochs to get back to where I was. This was a big deal for me because my epochs were 2 hours each. I also immediately got a mAP improvement with my first resumed epoch.

Use example: https://gist.github.com/nineclicks/39a33ee5dde832837adf146781d51036

Fork: https://github.com/nineclicks/ImageAI

Arishma736 commented 4 years ago

While trying to re-train from an already saved model with weights(in yolov3) ,the following error occurs: OSError: Unable to open file (truncated file: eof = 98566144, sblock->base_addr = 0, stored_eof = 247017776) So is there any problem with the .h5 file?

OlafenwaMoses / ImageAI

Incremental training custom models #298

The questions I am trying to resolve , are around an algorithim for restarting training, and with this clarification possibly add it to the documentation imageai/Detection/Custom/CUSTOMDETECTIONTRAINING.md