bugbakery / audapolis

an editor for spoken-word audio with automatic transcription
GNU Affero General Public License v3.0
1.69k stars 40 forks source link

Edit Transcript Text #359

Open charlesangus opened 2 years ago

charlesangus commented 2 years ago

Hey there, just testing Audapolis for the first time. Neat project!

I was surprised to find I can't edit the transcript in the text window - it would be much easier to clean up the text right there than to do it in another program, but I didn't seem to be able to type. I can delete words but not edit/change them? Is that the intended behaviour or am I doing something wrong?

I would want to be able to:

anuejn commented 2 years ago

Hey, thanks for reaching out. It is possible to edit the transcript but that functionality is currently not very discoverable. Currently you have to select the text you want to correct and press i on your keyboard. If you want to change the speaker of some portion of text, you have to make it a separate paragraph and then change the speaker by clicking on the speaker name.

Leaving this open because we obviously need to make that functionality more discoverable & intuitive to use

rugk commented 2 years ago

Currently you have to select the text you want to correct and press i on your keyboard.

Hehe i see some vom friends over there.

So my two ux cents: best would be editing without such a thing IMHO. So what is preventing you from just enabling/allowing edits for everything?

pajowu commented 2 years ago

It is possible to edit every word by placing the cursor after it and pressing i (or placing the cursor in front of the word and pressing o). If you want to edit more than one word at a time, you need to select the word group you want to edit first, before pressing either i or o

rugk commented 2 years ago

I got that yeah, but why do you need to press !? What us stopping you from skipping this step and just implementing it as any text editor where you can type without pressing any special key? I assume there is a technical reason for that…

pajowu commented 2 years ago

Yes, there is. We need to keep the mapping word-text <> timing. Allowing one to edit the entire document as if it were a normal text would make that difficult

pajowu commented 2 years ago

Also I don't think always editing the transcript is a good decision. While editing the transcript text is an important part, I think the main focus of this app is on the audio-/video-editing aspect of it.

rugk commented 2 years ago

Okay then, some ux ideas:

Also there should be some feedback to the user when:

charlesangus commented 2 years ago

Without an accurate transcript, I'm not really sure how useful this app is. Editing and cleaning up the transcript would seem to be a critical part of any software that offers a transcription.

Editing-by-transcript is an interesting idea, but any serious editing will be done in Premiere/Avid/etc. Paper edits based on timestamped transcripts is still an important workflow even if you can edit the video based on the transcript.

And transcription, with cleanup, can be fed into subtitle creation, which is important for accessibility.

My 2c. I think transcription, cleanup, timestamped text/EDL output and subtitle output seems more useful than direct video editing.

pajowu commented 2 years ago

Without an accurate transcript, I'm not really sure how useful this app is.

We agree, that's why we currently have a word-/selection-based editing, which allows users to quickly edit parts of the document while still keeping enough timing-information to allow for text-based editing. This currently is hidden behind obscure keyboard-shortcuts, but we plan on making this more easy to find.

but any serious editing will be done in Premiere/Avid/etc.

We agree, that's why we have an export option for these programs (although is barely works at the moment, but we're on it, #294, #371)

And transcription, with cleanup, can be fed into subtitle creation, which is important for accessibility.

This is an important use-case and one I use audapolis for quite often as well. However I don't want this to be the primary focus of audapolis. While I talked with a lot of people who voiced interest in using audapolis in that way, I personally think it might be better to separate that a bit from the current editing approach.

One way could be by adding a special subtitleing-mode to audapolis, which allows for nicer text-editing but not for video-/audio-editing. Another way would be to create an entirely separate app/service for this, which also has certain advantages (for example collaborative editing). I would be very interested in being part of developing this.


I also noticed one more feature-wish from your original post that we might already have:

  • change speaker for selected text

You can already change the speaker of a paragraph. This is possible by clicking (right or left) on the speaker name and choosing "reassign speaker".

Or did you mean that you want change the speaker for only a part of a paragraph? This is also possible by moving this part to a seperate paragraph (click in front of the first word, press Enter; then click after the last word and press Enter again). But if you have to do this ofter, please comment again so we can think about a better way to do this

pajowu commented 2 years ago

Okay then, some ux ideas:

We had these idea as well. Some are already on our wishlist, but couldn't be implemented yet because of missing time, for some there are good reasons not to do it that way (at the moment)

  • limit the amount of characters you can select/edit/overwrite (the downside is this is kinda hard to communicate/explain)

There are legitimate use-cases for overwriting larger chunks of the document at a time. One example is subtitling, where you might not care about the exact timings of words/groups of words, but only about that of an entire paragraph.

  • introduce a squiggle line like for spell checking in editors, or spell checking in general. There you click or right click on it and can choose a different word → that's very inuitive because of the history of how spell checking in general was used and is known to users

This is something we want to do at some point, but weren't able to do yet because of missing time. Our current transcription library already has a mode to suggest multiple different possible transcripts. However (at least last time I checked), it only provided timing information for the "best one". So we would either (a) need to patch vosk to output timing information for all of them or (b) calculate the differences of the transcriptions against the one we get timing information for and then estimate the timing of the differing words.

  • show some hover menu on hover and/or right click (or some alternative way for touch users e.g.) or of course only shown when text is actually selected, it could have an entry "correct transcript"

This is already planned & implemented in #341

  • introduce some global mode (toggles with a toolbar button e.g.) where you can only edit

This again has the problem of keeping accurate enough timing-information

pajowu commented 2 years ago

Two more things:

  1. We recently added a special mode which highlights low-confidence words. As with most transcript correction features I talk about in this thread, you need Version 0.2.2 or higher, which is currently only available as a beta version.

  2. The whole topic of transcription correction would be much easier if we included a tool for "forced alignment". Forced alignment is the task of time-matching a transcript with its audio. There is a number of existing tools (https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner, https://github.com/readbeyond/aeneas, https://github.com/mozilla/DSAlign, https://github.com/maxrmorrison/pyfoal, https://github.com/r4victor/afaligner) to do that, but so far we didn't have time to evaluate them, select a good one and integrate it. This is especially tricky as we would want it to support as many of the languages currently supported as possible

Sogolumbo commented 1 year ago

I think it would be great to have a mouse shortcut for editing text. I'd suggest double click to edit transcription text. Another option would be middle click but I don't think that's very intuitive. (in 2.2-pre4 the right click editing is not possible anymore)

Sogolumbo commented 1 year ago

Also it would be much easier to correct bigger mistakes if there was a shortcut to jump from one word / word-group to the next. possible options: