When a keyframe is very close to the beginning or end of a video, a (symmetrical) one second window of audio cannot be creared. Moreover, when a keyframe is close to a shot boundary, a one second window may be inappropriate. However, the feature extraction model requires homogeneous, one second-based spectograms.
We discussed several solutions:
shift the keyframe timestamp away from the boundary (and extract both spectogram and keyframe at that moment in time)
pad the extracted audio with repeated frames. This can be mirrorred padding (playing the edge frames again in reverse) or circular padding (repeat the last frames in he beginning, or the first frames at the end). The last approach is expected to be least harmful to the spectogram.
discard all keyframes that are close to the edges alltogether
The second approach (apply padding, in a circular fashion) is deemed most appropriate.
However, due to time constraints we stick with the last approach (discarding edge frames) at least for the video boundaries, as a minimum effort solution.
NB: the same holds for the annotations a researcher uses as query for similarity search!
When a keyframe is very close to the beginning or end of a video, a (symmetrical) one second window of audio cannot be creared. Moreover, when a keyframe is close to a shot boundary, a one second window may be inappropriate. However, the feature extraction model requires homogeneous, one second-based spectograms.
We discussed several solutions:
The second approach (apply padding, in a circular fashion) is deemed most appropriate. However, due to time constraints we stick with the last approach (discarding edge frames) at least for the video boundaries, as a minimum effort solution.
NB: the same holds for the annotations a researcher uses as query for similarity search!