Open acleitao76 opened 5 years ago
I think this is possible, but be sure to backup your first models and detection_config.json somewhere, as the new detection_config.json will be incompatible with the older models and vice versa.
Ok I found some nice information. I will leave the link here just in case someone has the same doubt. Yes it's possible to continue the training of a model just have to use continue_from_model parameter. https://github.com/OlafenwaMoses/ImageAI/blob/master/imageai/Prediction/CUSTOMTRAINING.md#continuoustraining
Thx for the help. Really excited about this library
My use case is trying to use Preemptible GPU instances', so being able to resume a training process ,for ObjectDetection Models, in the event of preemption or other failure.
using this trainer from imageai.Detection.Custom import DetectionModelTrainer
So I understand from above we can resuse the best result from (trainer.evaluateModel)
Forgive some newbie questions, i am just beginning to get my head around CNN's and reading up on the Keras API which @OlafenwaMoses and team have so elegently encapsulated in imageAI.
I can see #266 asking on a similar thread, and I see the usefulness of the following PR #261
# pseudo code for a resumable training
# Trainer configure
from imageai.Detection.Custom import DetectionModelTrainer
....some setup code here ...... and code is not complete
trainer.setDataDirectory(data_directory="DatasetCustom1")
trainingEpochs = 3
totalTrainingExperiments = 120
aPersistantStore = 'someMountedPersistantPath'
# First Iteration
state=RetrieveLastKnownState(storage_path=aPersistantStore)
if not stateFirstTimeLearn(): # some function on state
pretrainedModel='pretrained-yolov3.h5
trainer.setTrainConfig(object_names_array=objNamesList, batch_size=4, num_experiments=trainingEpochs, train_from_pretrained_model=pretrainedModel)
trainer.trainModel()
state = {'EpochCount':trainingEpochs, 'Trained':True, 'Evaluated':False, selectedModel:None}
results = trainer.evaluateModel(model_path="DatasetCustom1/models", json_path="DatasetCustom1/json/detection_config.json", iou_threshold=0.5, object_threshold=0.3, nms_threshold=0.5)
selectedEpochName = determineSelectedEpoch_on_mAP() # a function of results['mAP'] see #302
state = {'EpochCount':1, 'Trained':True, 'Evaluated':True, 'selectedModel':selectedEpoch_path }
saveSelectedEpoch(storage_path = aPersistantStore,
model_path="DatasetCustom1/models/",
selected_epoch=selectedEpochName,
json_path="DatasetCustom1/json/detection_config.json"
state =state ) # Save
while state['EpochCount']< totalTrainingExperiments or stateResumed:
# Second to N iterations or resumptions
if stateResumed(): # some function on State
retrieveDataset(storage_path = aPersistantStore,
dataset_path="DatasetCustom1")
state, pretrainedModel = retrieveSelectedEpochforFurtherTraining(storage_path = aPersistantStore,
pretrained_path = "DatasetCustom1/pretrained",
)
elif NextItteration(): # some function of State
pretrainedModel = moveSelectedModeltoPretrainedPath(pretrained_path = "DatasetCustom1/pretrained",
model_path="DatasetCustom1/models/",selected_epoch=selectedEpochName)
trainer.setTrainConfig(object_names_array=objNamesList, batch_size=4, num_experiments=trainingEpochs, train_from_pretrained_model=pretrainedModel)
trainer.trainModel()
currentTotalEpochCount = state['EpochCount']+
state = {'EpochCount':currentTotalEpochCount, 'Trained':True, 'Evaluated':False, selectedModel:None}
results = trainer.evaluateModel(model_path="DatasetCustom1/models", json_path="DatasetCustom1/json/detection_config.json", iou_threshold=0.5, object_threshold=0.3, nms_threshold=0.5)
selectedEpochName = determineSelectedEpoch_on_mAP(results) # a function of results['mAP'] see PR #302
state = {'Iteration':currentTotalEpochCount, 'Trained':True, 'Evaluated':True, 'selectedModel':selectedEpoch_path }
saveSelectedEpoch(storage_path = aPersistantStore,
model_path="DatasetCustom1/models/",
selected_epoch=selectedEpochName,
json_path="DatasetCustom1/json/detection_config.json"
state =state ) # Save
continue_from_model
https://github.com/OlafenwaMoses/ImageAI/issues/298#issuecomment-522795595
This is correct for training Predicition models, ObjectDetection models "yolo v3" do not have this option.
only def setTrainConfig(self, object_names_array, batch_size= 4, num_experiments=100, train_from_pretrained_model=""):
https://github.com/OlafenwaMoses/ImageAI/blob/bde933a1a168f89dcbec4e3c4b8154ad6d9794a1/imageai/Detection/Custom/__init__.py#L156.
I share your enthusiasm for this library, IT WORKS GREAT, just trying to make it fit some real life scenarios.
This is correct for training Predicition models, ObjectDetection models "yolo v3" do not have this option.
Hello. Sorry for having more newbie questions, but still: so, is there a way to continue training, with
using this trainer from imageai.Detection.Custom import DetectionModelTrainer ?
If there is a way, should we keep all pictures in train folder, or i can delete old and add some new?
I need to double check it but I think you can do it easly by providing the path to a partially trained model file (i.e. .h5f file in data_directory/models)(*) when you set the configuration in the following operation:
def setTrainConfig(self, object_names_array, batch_size= 4, num_experiments=100, train_from_pretrained_model="dataset_directory/models/some_partially_trained_model.h5"):
I can't try it right now, maybe I try it later, but you can do it your self, just do the following:
1) Train for an object from yolo's weigths files (i.e as suggested on docs) for some experiments (let's say 4) 2) consider the loss that is shown on scren when the taraining finishes, it should be low 3) train again, but starting from a pretrained model file generated on previous step (the last of the model files), as suggested above (*)
If everything goes well, the second training round (step 3) should start with losses values similar to the shown when training was finishing at step 2
Thanks everyone for this conversation. My apologies for commenting late in great discussions like this. This is due to all amount of work involved in the project's ecosystem. I will try to keep up as much as possible. Now let me address the issues raised.
detection_config.json
file is an issue that will be resolved soon. Please note that you can continue your training using a previously generated model in a different training, use different/add new train and validation images as well.detection_config.json
file but will reduce as the training progress.I made a quick and dirty fork of ImageAI to optionally accept anchors rather than generating new ones. Admittedly I do not know the full ramification of doing this, I assume it would be a bad idea to change the training data without regenerating anchors. However when using this to simply restart where I last left off, I do not lose any loss progress whereas simply transfer training from my last model with new anchors would always require 4 to 6 epochs to get back to where I was. This was a big deal for me because my epochs were 2 hours each. I also immediately got a mAP improvement with my first resumed epoch.
Use example: https://gist.github.com/nineclicks/39a33ee5dde832837adf146781d51036
While trying to re-train from an already saved model with weights(in yolov3) ,the following error occurs: OSError: Unable to open file (truncated file: eof = 98566144, sblock->base_addr = 0, stored_eof = 247017776) So is there any problem with the .h5 file?
Can I use this procedure to increment train a model? https://medium.com/deepquestai/object-detection-training-preparing-your-custom-dataset-6248679f0d1d
like... I do that... get the best result .h5 file and start the tutorial all over with that file instead of the pre trained yolo model?