Closed chrisbrickhouse closed 8 months ago
While tuning is mostly implemented by 4461024, it turns out to not resolve the problems in the OP while #11 does, so I'm closing this as not planned. There are some commits in the models branch that might be useful (see e0d0f7b).
In offline testing, there seem to be two main problems:
The hypothesis is that the Wav2Vec2 models are trained on speeches or audio-book recordings which have fewer disfluencies and speech overlap than the conversational data seen in sociolinguistic interviews. The transcriptions and alignments of those base models are also likely less accurate than research. By providing a workflow for fine-tuning the model, the problems should hopefully be mitigated.