Forced-Alignment-and-Vowel-Extraction / fave-asr

Interface for automated transcription and time alignment of conversational interview data
https://forced-alignment-and-vowel-extraction.github.io/fave-asr/
GNU General Public License v3.0
3 stars 0 forks source link

Workflow for fine-tuning language models #10

Closed chrisbrickhouse closed 6 months ago

chrisbrickhouse commented 6 months ago

In offline testing, there seem to be two main problems:

The hypothesis is that the Wav2Vec2 models are trained on speeches or audio-book recordings which have fewer disfluencies and speech overlap than the conversational data seen in sociolinguistic interviews. The transcriptions and alignments of those base models are also likely less accurate than research. By providing a workflow for fine-tuning the model, the problems should hopefully be mitigated.

chrisbrickhouse commented 6 months ago

While tuning is mostly implemented by 4461024, it turns out to not resolve the problems in the OP while #11 does, so I'm closing this as not planned. There are some commits in the models branch that might be useful (see e0d0f7b).