evamaxfield / cue-queue

Transcript segmentation using the average semantic encodings of cue sentences.
MIT License
2 stars 0 forks source link

Sequence alignment using classified cue sentences / blocks as signal #11

Open evamaxfield opened 3 years ago

evamaxfield commented 3 years ago

Topic and Cue Alignment

Build a classifier for if something is a cue block or if something is a discussion block.

Can test different size block sizes (moving window) from 1, 2, 3, 4, 5, 10, sentences etc. After finding the block size classifier that performs best, use the trained classifier to generate a signal that is 0 for discussion blocks and 1 for cue blocks.

So a transcript's generated sequence from the classifier may look something like the bottom sequence in the image. (1, 0, 0, 0, 1, 1, 0, 0 ,0, 0, 1, 1, 0, 1)

The top sequence is created by assuming that there will always be an "intro cue", some discussion, and then an "outro cue". So generate this sequence as (1, 0, 1) * M where M is the number of minutes items. I.e. for three minutes items the generated sequence is (1, 0, 1, 1, 0, 1, 1, 0, 1).

Finally perform dynamic time warping / sequence alignment on these two sequences to find best path.

Eval overal performance with PK / WindowDiff.

evamaxfield commented 3 years ago

Or, create an average cue block vector and an average discussion block vector, then do sequence alignment with the vectors and the distance calculation is cosine distance.

evamaxfield commented 3 years ago

https://www.youtube.com/watch?v=QoL7drXt6EA&list=PLmZlBIcArwhMJoGk5zpiRlkaHUqy5dLzL&index=4

evamaxfield commented 3 years ago

Get the average feature vector for intro cue, the average feature vector for outro cue, and create a vector for the minutes item itself to act as the discussion block instead of just averaging all other non cue vectors.

evamaxfield commented 3 years ago

We may be able to fine tune a sentence transformers model: https://www.sbert.net/docs/training/overview.html#loss-functions