bbc / react-transcript-editor

A React component to make correcting automated transcriptions of audio and video easier and faster. By BBC News Labs. - Work in progress
https://bbc.github.io/react-transcript-editor
Other
558 stars 164 forks source link

A way of preserving or restoring time-codes while or after editing the text #30

Closed pietrop closed 5 years ago

pietrop commented 5 years ago

I've tried the following experiment

screen shot 2018-12-20 at 16 55 08

Next step

pietrop commented 5 years ago

So it be good to test this out more extensively but also to try out the "wave form comparison approach" eg WebAligner - similar to Aeneas but perhaps possible client side (?).

pietrop commented 5 years ago

I've tried with the same audio and text from the Ted Talk used in the demo app.

TL;DR You can try loading the re-aligned json (ted-talk-kate-realigned.json.) in the demo app and click on words across the text to see the overall quality of the alignment. It's not bad, but it's not 100%, It might be good enough (?).

There might be some optimisation/tweaks possible, such as first aligning at sentence/line level (eg using Levenshtein distance), and then aligning within the sentence level, which might give even more accurate results.

pietrop commented 5 years ago

I tried the other algo for aligning at sentence/line level (using Levenshtein distance), but didn't get to the point of align the words within the sentences, because the sentence level alignment wasn't able to handle the Ted Talk example... needs more investigations. (hopefully open sourcing these algos in the new year)

pietrop commented 5 years ago

Just making a note that another drastically different option for preserving time-codes is to restrict the edit only within word boundaries.

Similar to how @chrisbaume had done in bbc/dialogger.

Also similar to earlier bbc/transcript-editor by @alexnorton - see demo choose decorator option withWords.

This could be done with draft using mutable as mutability of the entities (I think this is already in place) but disallow insertion/editing of text outside of an entity.

Only issue with this approach is what happens if you delete a whole paragraph, and start writing it again from scratch?

pietrop commented 5 years ago

Another option via @Laurian from BBC/Subtitlelizer project

Each word is an entity, so:

  1. if you edit within a word all is fine
  2. if you split a word, you have a space inside an entity, so you can split the entity data into 2 words
  3. if you join a word, you have entities with no space in between, you can join into a single one
  4. if you have text without an entity range around, that’s new typed stuff, you can recompute/average what that data might be

in subtitalizer since only the start/end of a paragraph (caption item) matter, I always do 4 in this way:

  • split into words
  • recompute/estimate based on word length vs paragraph duration
  • recreate new entities for the words in that para

how do you handle edge cases, when someone deletes a chunk of text, like a whole paragraph or parts of it?

if a block of text is deleted, the timing per para is always computed from the first and last entity that deals well with joining and splitting paragraphs too

https://github.com/bbc/subtitalizer/blob/master/src/components/TranscriptEditor.js#L597

in subtitalizer I trigger that on retiming the timecodes by hand, and on splitting/joining paragraphs only. split/join is easy to detect onChange, even by looking at number of blocks. now if you want to do this onchange on every keystroke, better debounce it as it will slow you down

retime() in subtitalizer averages timing data over the existing and new words in a paragraph, it uses existing timing for start/end para or that can be supplied in subtitalizer by hand is when you change the timecode in the timecode widgets per para

now you might just need this averaging to apply only if the averaging value is massively different for a word, so in a way preserve exisiting timings.


TL;DR:

Words

Words can have a base duration for when is very short, but can also be computed if longer (etc.. there’s a bunch of different ways to do this).

Paragraph

pietrop commented 5 years ago

To recap the options:

  1. Aeneas server side (altho @Laurian said there might be ways to get Aeneas to fit in a AWS Lambda(?))
  2. Web aligner (aligner client side, equivalent of having Aeneas running in the browser - done by @chrisbaume , not fully working yet but it could with some tweaking) - more info here
  3. Using STT json and diff algo to transpose timecodes to accurate text - more info here
  4. computing entities time-codes on change - as described above by @Laurian - more info here
chrisbaume commented 5 years ago

I think the web aligner will take more work than you make out. The current web aligner proof-of-concept just looks for gaps in the audio amplitude. A better approach would be to follow the same algorithm as aeneas: https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md

  1. Convert text to speech using https://w3c.github.io/speech-api/speechapi.html#tts-section
  2. Extract MFCCs from original speech and generated speech using https://github.com/meyda/meyda
  3. Perform dynamic time warping using https://github.com/langholz/dtw or https://github.com/GordonLesti/dynamic-time-warping

I couldn't find any JS libraries that do Sakoe-Chiba Band DTW, so the above algorithms may be too slow to be practical. As such, they might have to be modified to use the Sakoe-Chiba Band approach.

If it were up to me, I'd go for option 1. However, I would set it up so that it only aligns the corrected bits rather than the whole thing. I don't know much about Lambda, but spinning up an instance might take a few seconds, which would be too slow IMO.

pietrop commented 5 years ago

addressed in https://github.com/bbc/react-transcript-editor/pull/144 by @murezzda

pietrop commented 5 years ago

closing for now, as it's been added to https://github.com/bbc/react-transcript-editor/pull/175 and soon to be merged into master, pending @jamesdools review of https://github.com/bbc/react-transcript-editor/pull/144