bbc / react-transcript-editor

A React component to make correcting automated transcriptions of audio and video easier and faster. By BBC News Labs. - Work in progress
https://bbc.github.io/react-transcript-editor
Other
571 stars 165 forks source link

Speechmatics adapter throws error if empty speechmatics transcript is provided #134

Open murezzda opened 5 years ago

murezzda commented 5 years ago

Describe the bug If speechmatics api returns a transcript with no text, the speechmatics adapter throws an TypeError: Cannot read property 'children' of undefined error.

To Reproduce Steps to reproduce the behavior:

  1. Create transcript from audio with no spoken content in speechmatics.
  2. Load the transcript.

Expected behavior A empty transcript is shown in the editor and no error is thrown.

pietrop commented 5 years ago

@murezzda do you have any thoughts on a workaround for this?

It might be a good edge case to add to the tests, and that we'd have to check the other STT adapters are handling as well.

murezzda commented 5 years ago

@pietrop I haven't yet looked at it, but I plan on fixing this soon. Regarding other STT adapters I did not do any testing. I would also recommend to add them to the tests, as an empty transcript will occur from time to time in a productive environment (wrong/bad file or quality of audio).

murezzda commented 5 years ago

I found two options for solving this problem:

Depending on wether this is a speechmatics specific problem or not I would either opt for the first or the second option. Has someone tried to reproduce the problem with other stt vendors?

pietrop commented 5 years ago

Thanks @murezzda,

Yeah, I was thinking about the first option. Seemed the most straight forward and easier to test. altho a lot of "logic repetition" across the adapters.

The second one could also be a good fix that works across adapters, but each adapter should still handle the "empty transcript conversion".


For option one, if an STT adapter gets an "empty transcript" if it returns this draftJS json like this one below

{
  "blocks": [
    {
      "key": "67dlg",
      "text": "",
      "type": "paragraph",
      "depth": 0,
      "inlineStyleRanges": [],
      "entityRanges": [],
      "data": {
        "speaker": "UKN",
        "words": [],
        "start": 0
      }
    }
  ],
  "entityMap": {}
}

then it would be shown like this

Screen Shot 2019-04-10 at 12 17 20

you can try it in the demo app by importing draftJs json

But a user might not be clear on what to do next. eg start typing text, re-do the transcript etc..

However there might be a number of reasons why the transcript is empty. And I guess the choice to make is, if there's an empty transcript, should we just display an error message to say something went wrong and there's no automated transcript to correct?

It also feels like this might be something the parent app needs to handle? either fully or partially?

Happy to think it through a bit more and discuss

pietrop commented 5 years ago

Ok, here's a proposed solution, feel free to disagree.

What do you think?