Closed keighrim closed 1 month ago
For the sake of implementation, let's change
parser.add_argument("-i", "--input-video",
help="filepath for the video to be processed.",
required=True)
to be ether a single video file name or a directory name with lots of image files.
reopened by mistakenly pushing a branch under the old name.
New Feature Summary
At the moment, the data preprocessor expects one video file and one CSV file of manual label annotation
https://github.com/clamsproject/app-swt-detection/blob/7be4b818a0c72713e501b27be9ebaeee5a3e1320/modeling/data_loader.py#L249-L254
to prepare CNN vectors and a metadata json file
https://github.com/clamsproject/app-swt-detection/blob/7be4b818a0c72713e501b27be9ebaeee5a3e1320/modeling/data_loader.py#L240-L243
However, we are now receiving additional annotations from GBH that's done on more videos but in much sparser way. And most importantly, the video file is not a part of delivery package, but extracted frame images are.
To cope with the different data situation for next rounds of training, we need to update the data preprocessor to handle the new batches of annotations.
Additional context
Current train-ready preprocessed data looks like this;
Here's where the preprocessed data is read
https://github.com/clamsproject/app-swt-detection/blob/7be4b818a0c72713e501b27be9ebaeee5a3e1320/modeling/train.py#L129-L147
And finally, due to the sparsity of the annotation work for next batches, we need to add new GUIDs to this list
https://github.com/clamsproject/app-swt-detection/blob/7be4b818a0c72713e501b27be9ebaeee5a3e1320/modeling/gridsearch.py#L23-L24