Open iceberg53 opened 7 months ago
Modify file createTextFromAudioFile.ts
from code:
for await (const data of wfReadable) {
bytesRead += data.length;
updateProgressBar(bytesRead);
const endOfSpeech = recognizer.acceptWaveform(data);
if (endOfSpeech) {
const result = recognizer.result();
results.push(result);
}
}
to this:
for await (const data of wfReadable) {
bytesRead += data.length;
updateProgressBar(bytesRead);
const endOfSpeech = recognizer.acceptWaveform(data);
if (endOfSpeech) {
const result = recognizer.result();
results.push(result);
}
else {
const partialResult = recognizer.partialResult();
results.push(partialResult);
}
}
const finalResult = recognizer.finalResult(recognizer);
results.push(finalResult);
Thanks a lot. I already came up with a fix in this commit but I'm really happy to have your take on that issue.
@iceberg53 you are welcome 🙏
Recently veed-io removed a function to create auto subtitles from free plan and I used your product. I have wav files that I generate from elevenlabs-io but I have problem - model doesn't generate subtitle words in original way. Do you know the way to train it?
I made new parametor:
//IVANOV FIX
.option(
'-g --origin [text]',
"Origin text",
)
And added my additional functions:
// IVANOV FIX:
if (originText) {
/*console.log(`------------------------------`);
console.log(`IVANOV origin text: [${originText}]`);
console.log(`------------------------------`);
console.log(`IVANOV RAW results FULL: [${JSON.stringify(results)}]`);
console.log(`------------------------------`);*/
const ivanovResult: WordResult[] = [];
results.forEach(({ result: words }) => {
if (!words) return;
words.forEach((word: any) => {
ivanovResult.push(word);
});
});
//console.log(`IVANOV FIXED results FULL: [${JSON.stringify(ivanovResult)}]`);
//console.log(`------------------------------`);
let count = 0;
let startIndex = 0;
let cueString = '';
let endOfOriginText = originText;
let stopReplacing = false;
const MAX_CUE_STRING_LENGTH = 20;
const ivanovResultLength = ivanovResult.length;
for (let i = 0; i < ivanovResultLength; i++) {
count++;
//console.log(`IVANOV index: [${i}]`);
//console.log(`IVANOV startIndex: [${startIndex}]`);
//console.log(`IVANOV count: [${count}]`);
const wordContent = ivanovResult[i];
//console.log(`IVANOV wordContent${i}: ${JSON.stringify(wordContent)}`);
const firstWord = endOfOriginText.replace(/ .*/,'');
const decodedWord = wordContent.word;
//console.log(`IVANOV firstWord: [${firstWord}]`);
//console.log(`IVANOV decodedWord: [${decodedWord}]`);
const firstWordCleaned = firstWord?.toLowerCase().replace(/'/g, '');
const decodedWordCleaned = decodedWord?.toLowerCase().replace(/'/g, '');
console.log(`IVANOV CREANED firstWord: [${firstWordCleaned}]`);
console.log(`IVANOV CREANED decodedWord: [${decodedWordCleaned}]`);
if (!stopReplacing && firstWordCleaned.includes(decodedWordCleaned)) {
wordContent.word = firstWord;
endOfOriginText = endOfOriginText.replace(firstWord + ' ','');
//console.log(`IVANOV STANDARD endOfOriginText: [${endOfOriginText}]`);
}
else {
stopReplacing = true;
}
const currentWord = wordContent.word;
cueString += currentWord + ' ';
let wordsLengthLimit = false;
const nextWordIndex = i + 1;
if (nextWordIndex < ivanovResultLength) {
const nextWordContent = ivanovResult[nextWordIndex];
const nextWord = nextWordContent?.word || '';
wordsLengthLimit = (cueString + nextWord).length > MAX_CUE_STRING_LENGTH;
}
const lastChar = currentWord.slice(-1);
//console.log(`IVANOV lastChar: [${lastChar}]`);
//console.log(`IVANOV cueString: [${cueString}]`);
//console.log(`IVANOV cueString length: [${cueString.length}]`);
if (lastChar == '.' || lastChar == '?' || wordsLengthLimit || count >= WORDS_PER_LINE && currentWord != 'a' && currentWord != 'to') {
const end = Math.min(startIndex + count - 1, ivanovResult.length - 1);
const cue = createCueFromWords(ivanovResult, startIndex, end);
subtitles.push(cue);
console.log(`IVANOV pushed cue: [${JSON.stringify(cue)}]`);
startIndex = i + 1;
count = 0;
cueString = '';
}
console.log(`------------------------------`);
};
}
else {
results.forEach(({ result: words }) => {
if (!words) return;
for (let start = 0; start < words.length; start += WORDS_PER_LINE) {
const end = Math.min(start + WORDS_PER_LINE - 1, words.length - 1);
const cue = createCueFromWords(words, start, end);
subtitles.push(cue);
}
});
}
For example the model makes a mistake like: instead of its consequences it make its consequences says Do you how to fix it?
I don't know how to get the model to correct it. The models used in gen-subs vary in accuracy but the most accurate models require more computing resources. If your hardware is powerful enough, you should try a better model by downloading it with gen-subs. Otherwise, a workaround would be to generate subtitles first and then edit them to remove the inaccuracies. There's also an alternative to vosk the engine behind gen-subs. It's a more popular open source project from OpenAI which may give you better results: whisper and one of its implementations whisper.cpp . Maybe you should give it a try and see what you get . My expertise in AI and related fields is limited so I think there might be other ways to do it.
Thank you very much! But I downloaded the full model before. Generating subtitles before publishing is not an option, because I have a lot of videos that need to be decorated with subtitles. Wisper and others = good, but I need an interface like your product, so let it be more imprecise for a while 🙏
Ok, that's fine. In the end, what matters is that people understand the content of your videos.
Typically the model makes one of 3 mistakes: either an incorrect word, or 1 extra word, or 1 word is missing. If there are more than 2 mistakes in a row, then my correction will not help. But for now it works for me and I’m completely satisfied.
My code, if anyone is interested:
import { stringifySync } from "subtitle";
import { createCueFromWords } from "./createCueFromWords";
import { RecognitionResults, WordResult } from "vosk";
export async function createSrtFromRecognitionResults(results: RecognitionResults[], originText?: string) {
//const WORDS_PER_LINE = 7;
// IVANOV FIX:
const WORDS_PER_LINE = 3;
const subtitles: SubtitleCue[] = [];
if (!results.length) {
throw new Error("No words identified to create subtitles from.");
}
/*results.forEach(({ result: words }) => {
if (!words) return;
for (let start = 0; start < words.length; start += WORDS_PER_LINE) {
const end = Math.min(start + WORDS_PER_LINE - 1, words.length - 1);
const cue = createCueFromWords(words, start, end);
subtitles.push(cue);
}
});*/
// IVANOV FIX:
if (originText) {
/*console.log(`------------------------------`);
console.log(`IVANOV origin text: [${originText}]`);
console.log(`------------------------------`);
console.log(`IVANOV RAW results FULL: [${JSON.stringify(results)}]`);
console.log(`------------------------------`);*/
const ivanovResult: WordResult[] = [];
results.forEach(({ result: words }) => {
if (!words) return;
words.forEach((word: any) => {
ivanovResult.push(word);
});
});
//console.log(`IVANOV FIXED results FULL: [${JSON.stringify(ivanovResult)}]`);
//console.log(`------------------------------`);
let count = 0;
let startIndex = 0;
let cueString = '';
let endOfOriginText = originText;
let stopReplacing = false;
const MAX_CUE_STRING_LENGTH = 20;
const ivanovResultLength = ivanovResult.length;
for (let i = 0; i < ivanovResultLength; i++) {
count++;
//console.log(`IVANOV index: [${i}]`);
//console.log(`IVANOV startIndex: [${startIndex}]`);
//console.log(`IVANOV count: [${count}]`);
const wordContent = ivanovResult[i];
//console.log(`IVANOV wordContent${i}: ${JSON.stringify(wordContent)}`);
const firstWord = endOfOriginText.replace(/ .*/,'');
const decodedWord = wordContent.word;
//console.log(`IVANOV firstWord: [${firstWord}]`);
//console.log(`IVANOV decodedWord: [${decodedWord}]`);
const firstWordCleaned = firstWord?.toLowerCase().replace(/'/g, '');
const decodedWordCleaned = decodedWord?.toLowerCase().replace(/'/g, '');
console.log(`IVANOV CREANED firstWord: [${firstWordCleaned}]`);
console.log(`IVANOV CREANED decodedWord: [${decodedWordCleaned}]`);
//console.log(`IVANOV STOP replaceing: [${stopReplacing}]`);
//console.log(`IVANOV INCLUDES: [${firstWordCleaned.includes(decodedWordCleaned)}]`);
if (!stopReplacing) {
if (firstWordCleaned.includes(decodedWordCleaned)) {
wordContent.word = firstWord;
//console.log(`IVANOV STANDARD endOfOriginText: [${endOfOriginText}]`);
endOfOriginText = endOfOriginText.replace(firstWord + ' ','');
console.log(`IVANOV STANDARD MATCH`);
}
else {
const nextWordIndex = i + 1;
const tempEndOfOriginText = endOfOriginText.replace(firstWord + ' ','');
const nextFirstWord = tempEndOfOriginText.replace(/ .*/,'');
const nextFirstWordCleaned = nextFirstWord?.toLowerCase().replace(/'/g, '');
if (nextWordIndex < ivanovResultLength && nextFirstWordCleaned) {
const nextDecodedWordContent = ivanovResult[nextWordIndex];
const nextDecodedWord = nextDecodedWordContent?.word || '';
const nextDecodedWordCleaned = nextDecodedWord?.toLowerCase().replace(/'/g, '');
console.log(`IVANOV CREANED nextFirstWordCleaned: [${nextFirstWordCleaned}]`);
console.log(`IVANOV CREANED nextDecodedWord: [${nextDecodedWord}]`);
if (firstWordCleaned.includes(nextDecodedWordCleaned)) {
wordContent.word = '';
console.log(`IVANOV FIXED MODEL RESULT 111 [current decoded word is a superfluous: next first = current decoded]`);
}
else if (nextFirstWordCleaned.includes(nextDecodedWordCleaned)) {
wordContent.word = firstWord;
endOfOriginText = endOfOriginText.replace(firstWord + ' ','');
console.log(`IVANOV FIXED MODEL RESULT 222 [current decoded word is false: next first = next decoded]`);
}
else if (nextFirstWordCleaned.includes(decodedWordCleaned)) {
wordContent.word = firstWord + ' ' + nextFirstWord;
endOfOriginText = endOfOriginText.replace(firstWord + ' ','');
endOfOriginText = endOfOriginText.replace(nextFirstWord + ' ','');
console.log(`IVANOV FIXED MODEL RESULT 333 [current first word is missed: next decoded = current first]`);
}
else {
stopReplacing = true;
console.log(`IVANOV FIXED MODEL RESULT 444 [stopReplacing after bad correction]`);
}
}
else {
console.log(`IVANOV FIXED MODEL RESULT 555 [stopReplacing no next first word]`);
stopReplacing = true;
}
}
}
const currentWord = wordContent.word;
cueString += currentWord + ' ';
let wordsLengthLimit = false;
const nextWordIndex = i + 1;
if (nextWordIndex < ivanovResultLength) {
const nextDecodedWordContent = ivanovResult[nextWordIndex];
const nextDecodedWord = nextDecodedWordContent?.word || '';
wordsLengthLimit = (cueString + nextDecodedWord).length > MAX_CUE_STRING_LENGTH;
}
const lastChar = currentWord.slice(-1);
//console.log(`IVANOV lastChar: [${lastChar}]`);
//console.log(`IVANOV cueString: [${cueString}]`);
//console.log(`IVANOV cueString length: [${cueString.length}]`);
if (lastChar == '.' || lastChar == '?' || wordsLengthLimit || count >= WORDS_PER_LINE && currentWord != 'a' && currentWord != 'to') {
const end = Math.min(startIndex + count - 1, ivanovResult.length - 1);
const cue = createCueFromWords(ivanovResult, startIndex, end);
subtitles.push(cue);
console.log(`IVANOV pushed cue: [${JSON.stringify(cue)}]`);
startIndex = i + 1;
count = 0;
cueString = '';
}
console.log(`------------------------------`);
};
}
else {
results.forEach(({ result: words }) => {
if (!words) return;
for (let start = 0; start < words.length; start += WORDS_PER_LINE) {
const end = Math.min(start + WORDS_PER_LINE - 1, words.length - 1);
const cue = createCueFromWords(words, start, end);
subtitles.push(cue);
}
});
}
return stringifySync(subtitles, { format: "SRT" });
}
I noticed while using gen-subs that the generated subtitles do not cover the entire video.
For instance, the video referenced in the [issue number 4] (https://github.com/TejasQ/gen-subs/issues/4) suffers from the same problem. And this seems to occur only in the last part of videos.
In the video below from the issue I mentioned earlier, the subtitles stop appearing about 10s before the end.
https://private-user-images.githubusercontent.com/725120/284995522-27e4ef6d-6cf2-400f-8e0f-f0710e2534b4.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc3MDA2OTAsIm5iZiI6MTcwNzcwMDM5MCwicGF0aCI6Ii83MjUxMjAvMjg0OTk1NTIyLTI3ZTRlZjZkLTZjZjItNDAwZi04ZTBmLWYwNzEwZTI1MzRiNC5tcDQ_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwMjEyJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDIxMlQwMTEzMTBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xY2I0YmVjMzlhYjAxZjliZDFjM2ZjZDE4YTE0ODg1Zjk2Y2IxMGZjNTU4ZTM0ODYzYjk1NzA3YzM0NTNiMTYwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.9Sm7WdT62MK5u0puC_z3nxUsaMEM82ZEMzCLtSyjvx0