Open keighrim opened 1 year ago
Simple find
cmds tells there are 8671 videos in our server.
$ find /llc_data/clams/wgbh/ -type f -name "*.mp4" -printf "%f\n" | cut -d . -f 1 | sort -u > ~/all_baapb_guids.txt
$ wc -l ~/all_baapb_guids.txt
8671 /home/krim/all_baapb_guids.txt
This should include some duplicates with suffixes that are not parts of GUID. For example, we have ;
...
cpb-aacip-151-4x54f1n09w__fma-2-62314-int-20120606_
cpb-aacip-151-4x54f1n09w__fma-2-62316-int-20120606_
...
files.
Using the entire set of videos in our server hrad drive (
/llc_data/clams
), we'd like to extract frames and features.frame extraction
We probably want to first run shot detection first to eliminate lots of duplicate images in the output. Then for all images extract, we need to record some metadata for images. Metadata includes
feature extraction
We can try different CNN backbone models for this. I'd like to start with VGG and ResNet maybe, and do more experiment in the future is we get time.