clamsproject / aapb-annenv-swt-hitl-clustering

Annotation environment for human-in-the-loop clustering-based image labelling
0 stars 0 forks source link

frame and feature extraction #1

Open keighrim opened 1 year ago

keighrim commented 1 year ago

Using the entire set of videos in our server hrad drive (/llc_data/clams), we'd like to extract frames and features.

frame extraction

We probably want to first run shot detection first to eliminate lots of duplicate images in the output. Then for all images extract, we need to record some metadata for images. Metadata includes

feature extraction

We can try different CNN backbone models for this. I'd like to start with VGG and ResNet maybe, and do more experiment in the future is we get time.

keighrim commented 1 year ago

Simple find cmds tells there are 8671 videos in our server.

$ find /llc_data/clams/wgbh/ -type f -name "*.mp4" -printf "%f\n" | cut -d . -f 1 | sort -u > ~/all_baapb_guids.txt
$ wc -l ~/all_baapb_guids.txt
8671 /home/krim/all_baapb_guids.txt

This should include some duplicates with suffixes that are not parts of GUID. For example, we have ;

...
cpb-aacip-151-4x54f1n09w__fma-2-62314-int-20120606_
cpb-aacip-151-4x54f1n09w__fma-2-62316-int-20120606_
...

files.