must anchor the transcript to video timestamps

marcverhagen commented 2 years ago

Now the output just shows something like this

<Transcript text="Hello, this is Jim Lehrer with the NewsHour on PBS."/>

And for full Kaldi output we also get a bunch of timeframes:

<TimeFrame start="12920" end="13010" frameType="speech"/>
<TimeFrame start="13010" end="14760" frameType="speech"/>
<TimeFrame start="19860" end="20040" frameType="speech"/>

But these timeframes are not connected to any text sequences (basically because they were found for the wrong reason and the script assumes they are significant like the output of the segmenter).

Instead, we need something like

<Transcript>
  <span text="Hello, this is Jim Lehrer with the NewsHour on PBS." start="1700" end="3555"/>
  <span text="Today, we are talking about the decline of the tomato" start=3800" end="4321"/>
</Transcript>

Want to introduce an option that governs the granularity at which we link, above it is sentences, but if we have no information on that we degrade to the token level (if we have no results from fastpunct and spaCy), and we we don't have that either we cannot give any alignments and we end up with

<Transcript>
  <span text="Hello, this is Jim Lehrer with the NewsHour on PBS. Today, we are talking about the decline of the tomato"/>
</Transcript>

marcverhagen commented 1 year ago

For the JSON default we want to make it look like this

"transcript": [
    [
      "Hello, this is Jim Lehrer with the NewsHour on PBS.",
      5500,
      11467
    ],
    [
      "We have exciting news about the tomato & Florida.",
      12345,
      18987
    ]
  ]

marcverhagen commented 1 year ago

Much of this is done in e02c422, but:

more testing is needed
there is no fallback in case there are no alignments
the code is not explicitly set up to deal with results of forced alignment

clamsproject / mmif-summarizer

must anchor the transcript to video timestamps #3