Align audio with transcription/translation

CambridgeSemiticsLab / nena

The North Eastern Neo-Aramaic Database Site

https://nena.ames.cam.ac.uk

5 stars 0 forks source link

Align audio with transcription/translation #43

Closed jamespstrachan closed 4 years ago

jamespstrachan commented 5 years ago

from @GeoffreyKhan :

Some kind of text-sound aligning. The easiest would be to align numbered paragraphs of text with the sound.

jamespstrachan commented 5 years ago

Quite possible. Is it worth doing this after the new standard transcription format is settled? I imagine something like:

(1@00:00) Jack and Jill went up the hill
(2@00:05) To fetch a pail of water
(3@00:09) ...

Where the time codes are embedded in the paragraph markers. @codykingham might that work as part of your standard?

GeoffreyKhan commented 5 years ago

Yes, after the standard transcription sounds best. The time codes could be embedded in the paragraph numbers, but the researchers who create the texts should ensure that the paragraphs are several lines long, so that the labour of aligning each paragaph is reduced.

codykingham commented 5 years ago

Technically these would be line indicators, not paragraphs. I think this depends on how the time code annotations will be made. There are some options out there for automatic word-to-word alignment of texts (e.g. here). These tools could align transcriptions at the word level. In that case, I would suggest we store these as features of words in Text-Fabric. Then for any given word, you easily retrieve its timestamp.

If, however, the time codes will be made by the researcher at the time of upload, then it indeed makes sense to put this is the upload template somehow. @jamespstrachan , I do like your proposed format.

jamespstrachan commented 5 years ago

Great if we can use an existing standard, though the granularity offered by the automatic tools will be labour-intensive for a human to replicate. This might limit how easily contributors could submit well-time-annotated text unless we can easily/automatically run submitted audio+transcript through the auto-alignment software. @codykingham do you have any sense of how easy this is to do, it's fault-tolerance and how well it fits into the 'production line' you're considering for processing Nena audio->word-transcript?->.nena format->time-annotated-format->text-fabric?

codykingham commented 5 years ago

I need to experiment. @hvlaardingerbroek did a test with a similar tool (not sure if it's the same as the linked one), and it produced fairly accurate results on NENA audio without any tweaking. But corrections would still be needed to the output. I'm leaning more and more towards the timestamp indicator in the line number. That is simpler and less dependent on the coding side.

codykingham commented 4 years ago

After some more thought, I propose we go with @jamespstrachan 's proposal. In the plain text format, then, line numbers can optionally be composed of two parts: A number and a timestamp:

(1@0:02) Some line here
(2@0.05) some other line.

jamespstrachan commented 4 years ago

I've had a shot at implementing this - it's a little rough but it works. It's live now on the staging instance. I have tried to fill in some time codes for this one:

https://nena-staging.ames.cam.ac.uk/audio/30/

See what you think, try adding some time codes yourself. Don't worry, this is on a separate database so anything you mess up here will not affect production. (conversely, doing lots of useful data entry here will be time wasted!)

GeoffreyKhan commented 4 years ago

Thanks, James. This looks great.

codykingham commented 4 years ago

@jamespstrachan Great work! This is a very nice demo. I find it quite easy to follow along set up this way. This is the simplest option, I think, with the least amount of dependencies. Better than automated alignments.

jamespstrachan commented 4 years ago

Now in production