Last checkpoint for resuming training

agentmorris commented 1 year ago

Could you please publish Last checkpoint (for resuming training) for mdv5?

Issue cloned from Microsoft/CameraTraps, original issue posted by skye-glitch on Aug 03, 2022.

agentmorris commented 1 year ago

Thanks for your interest in fine-tuning MDv5... I don't think you need a checkpoint for that, in fact according to the YOLOv5 documentation, you would only really use a checkpoint if you're resuming an interrupted training cycle, which only applies if you have access to the original training data.

Assuming you're looking to fine-tune MDv5 on new data, I think what you want to do is just use MDv5a or MDv5b as the starting weights for a new training cycle, like this:

python train.py --data your_new_training_data.yaml --weights path/to/md_v5a.0.0.pt

I can't exactly find documentation for this, but the YOLOv5 developer provides very helpful instructions on this thread.

Of course, you're in uncharted territory in terms of what the ideal learning rate would be for fine-tuning, and whether you might want to freeze some layers (documented here).

Let us know if that addresses your question? And let us know how it goes!

-Dan

(Comment originally posted by agentmorris)

agentmorris commented 1 year ago

Thanks Dan. The training command works. Please correct me if I am wrong: md_v5 is a YOLOv5 model and I can do training/inference using scripts that works on YOLOv5. For the purpose of keeping files in a consistent format, run inference with run_detector_batch.py is recommended. For resuming training, there is no special requirement for the model, and we can just train with any YOLOv5 script?

(Comment originally posted by skye-glitch)

agentmorris commented 1 year ago

For the purpose of keeping files in a consistent format, run inference with run_detector_batch.py is recommended.

Yes, that's correct. You can use YOLOv5's inference scripts and you will get meaningful bounding boxes, but you won't get the file format - or even the output class identifiers - that all of the script in our repo work with, or that third-party tools for working with MD results expect.

For resuming training, there is no special requirement for the model, and we can just train with any YOLOv5 script?

As far as I know, that's true... but as far as I know, you're the first person to try this. :) Let everyone know how it goes!

(Comment originally posted by agentmorris)

agentmorris / MegaDetector

Last checkpoint for resuming training #71