conservationtechlab / camml

A package of ML components for CTL field camera systems.
MIT License
4 stars 0 forks source link

Refactor conversion scripts for EfficientDet data prep #37

Closed iingram closed 10 months ago

iingram commented 10 months ago

Refactoring of some of the scripts used to prep training data for TFLite EfficientDet models from MegaDetector detections (for training set) and OID data (for validation and testing set; although I think our original plan was to have the validation set also come from the MD detections but this is not going to affect too much at this stage. The main thing I venture is that the test set used to generate final performance metrics by which we judge the trained models comes from the manually boxed set).

Mostly this refactoring is encapsulating duplicated code across the scripts into functions kept in dataprep.py module. Some renaming to fit what had become the convention for naming these scripts and moved one across from training folder into the conversions folder where all the rest are gathered.

Note: this particular EfficientDet workflow has still not been used on any data for which we don't have a groundtruth set of boxes as the focus of Eric's project was proving to ourselves that the WTS workflow actually yielded good results against models trained on manually boxed groundtruth data, hence there are assumptions throughout the scripts about the data and file structure it sits in being in OID format.

Resolves #36