Having full control over architecture is important, and making models more modular allows engineering and testing to become a more streamlined process. As we acquire more video data, our ability to load and handle sequential batches of data is incredibly important. However, we must still be able to prototype on a singular, random image dataset.
Here are the desired components of a modular Model Forge:
[x] Model-agnostic data loading (normal and sequential). This can be done by making each batch sequential, i.e. a batch of 8 images, each one after the next. Or, this could be done by making each position in the batch sequential, i.e. 8 random images, but each subsequent batch iterates on the previous index. There shouldn't be any difference in behavior for a certain model type.
[x] Hyperparameter parsing and choosing which hyperparameters to use.
[x] Training script that's able to tell what kind of model you're using either based on flags or model architecture.
[x] Models and layers should input and output the same thing (image and Yolo-formatted bounding boxes -> Yolo-formatted bounding boxes).
Having full control over architecture is important, and making models more modular allows engineering and testing to become a more streamlined process. As we acquire more video data, our ability to load and handle sequential batches of data is incredibly important. However, we must still be able to prototype on a singular, random image dataset.
Here are the desired components of a modular Model Forge: