clamsproject / app-swt-detection

CLAMS app for detecting scenes with text from video input
Apache License 2.0
1 stars 0 forks source link

training pipeline #6

Closed keighrim closed 1 year ago

keighrim commented 1 year ago

For the first implementation if the training pipeline, we should just implement simple torchvision-based classification model. The input will be formatted based on specification from #3 , but in general, it will come as a JSON file of list of image metadata (guid, timestamp, total duration, label, subtype label, etc) and a numpy matrix of extracted features from images. The JSON list and the matrix should have 1-to-1 mapping.

keighrim commented 1 year ago

Another feature we need for future experiments is to block some of GUIDs (passed as a separate argument to, say, train() calls) from being included in the training set. This will allow us to easily swap training sets from different source material (e.g. https://github.com/clamsproject/app-swt-detection/issues/2#issuecomment-1736319426)

MrSqually commented 1 year ago

opened a branch with the code for data ingestion - output seems good to me, but I'm new to VGG so I'll wait for a greenlight to run it on the data.