ChetanXpro / nodejs-whisper

NodeJS Bindings for Whisper - the CPU version of OpenAI's Whisper, as initially crafted in C++ by ggerganov.
https://npmjs.com/nodejs-whisper
MIT License
78 stars 16 forks source link

[Bug?] Audio being transcribed forever despite following standard #82

Open YozoraWolf opened 4 months ago

YozoraWolf commented 4 months ago

Hello, this might or might not be a bug, but I wanted to mention it as this package could be well used in combination with node-mic as well.

import { Logger } from '@src/logger';
import path from 'path';
import { nodewhisper } from 'nodejs-whisper'
import fs from 'fs';
import NodeMic from 'node-mic';

const voicesPath = path.resolve(process.cwd(), 'src/voice');

export const initVoice = async (): Promise<void> => {
    const mic = new NodeMic({
        rate: 16000,
        encoding: 'signed-integer',
        bitwidth: 16,
        endian: 'little',
        channels: 2,
        threshold: 20,
        fileType: 'wav',
        debug: true,
    });

    const micInputStream = mic.getAudioStream();
    const outputFileStream = fs.createWriteStream(`${voicesPath}/output.wav`);

    micInputStream.pipe(outputFileStream);

    micInputStream.on('data', (data) => {
        // Do something with the data.
    });

    micInputStream.on('error', (err) => {
        console.log(`Error: ${err.message}`);
    });

    micInputStream.on('started', () => {
        console.log('Started');
    });

    micInputStream.on('stopped', () => {
        console.log('Stopped');
    });

    micInputStream.on('paused', () => {
        console.log('Paused');
    });

    micInputStream.on('unpaused', () => {
        console.log('Unpaused');

    });

    micInputStream.on('silence', async () => {
        console.log('Silence');
        mic.stop();
    });

    micInputStream.on('exit', async (code) => {
        console.log(`Exited with code: ${code}`);
        await transcribeWAV();
    });

    mic.start();

    Logger.DEBUG(`Voices path: ${voicesPath}`);

};

const transcribeWAV = async () => {
    try {
        Logger.DEBUG('Transcribing voice...');
        const transcript = await nodewhisper(`${voicesPath}/output.wav`, {
            modelName: "tiny",
            verbose: true
        });

        Logger.INFO(`Transcript: ${transcript}`); // output: [ {start,end,speech} ]
    } catch (error: any) {
        Logger.ERROR(`Error occurred while transcribing voice: ${error.message}`);
    }
};

This is my code.

Output:

Microphone stopped
Found silence block: 21 of 20
Recording has finished with code = 1
Exited with code: 1
[DEBUG] Transcribing voice...
[Nodejs-whisper] Checking file existence: /home/wolf/develop/nodejs/okuuai/src/voice/output.wav
[Nodejs-whisper] Converting file to WAV format: /home/wolf/develop/nodejs/okuuai/src/voice/output.wav
[Nodejs-whisper] Checking if the file is a valid WAV: /home/wolf/develop/nodejs/okuuai/src/voice/output.wav
[Nodejs-whisper] File is a valid WAV file.
[Nodejs-whisper] Constructing command for file: /home/wolf/develop/nodejs/okuuai/src/voice/output.wav
[Nodejs-whisper] Executing command: ./main   -l auto -m ./models/ggml-tiny.bin  -f /home/wolf/develop/nodejs/okuuai/src/voice/output.wav

output.wav seems to be at 16kHz, following the same codec I use to make a test recording using Audacity. The audacity one is properly transcribed, whereas the one using the node-mic doesn't, but they both have exactly the same stream info. image

Am I missing something?

Update

Somehow this came out after leaving it to transcribe a while

[09:15:25.440 --> 09:15:35.440]   [BLANK_AUDIO]
[09:15:35.440 --> 09:15:45.440]   [BLANK_AUDIO]
[09:15:45.440 --> 09:15:55.440]   [BLANK_AUDIO]
[09:15:55.440 --> 09:16:05.440]   [BLANK_AUDIO]
[09:16:05.440 --> 09:16:15.440]   [BLANK_AUDIO]
[09:16:15.440 --> 09:16:25.440]   [BLANK_AUDIO]
[09:16:25.440 --> 09:16:35.440]   [BLANK_AUDIO]
[09:16:35.440 --> 09:16:45.440]   [BLANK_AUDIO]
[09:16:45.440 --> 09:16:55.440]   [BLANK_AUDIO]
[09:16:55.440 --> 09:17:05.440]   [BLANK_AUDIO]
[09:17:05.440 --> 09:17:15.440]   [BLANK_AUDIO]
[09:17:15.440 --> 09:17:25.440]   [BLANK_AUDIO]
[09:17:25.440 --> 09:17:35.440]   [BLANK_AUDIO]
[09:17:35.440 --> 09:17:45.440]   [BLANK_AUDIO]
[09:17:45.440 --> 09:17:55.440]   [BLANK_AUDIO]
[09:17:55.440 --> 09:18:05.440]   [BLANK_AUDIO]
[09:18:05.440 --> 09:18:15.440]   [BLANK_AUDIO]
[09:18:15.440 --> 09:18:25.440]   [BLANK_AUDIO]
[09:18:25.440 --> 09:18:35.440]   [BLANK_AUDIO]
[09:18:35.440 --> 09:18:45.440]   [BLANK_AUDIO]
[09:18:45.440 --> 09:18:55.440]   [BLANK_AUDIO]
[09:18:55.440 --> 09:19:05.440]   [BLANK_AUDIO]
[09:19:05.440 --> 09:19:15.440]   [BLANK_AUDIO]

(Please to take in consideration that this is a 0:06 seconds recording)

ChetanXpro commented 4 months ago

Ahh, interesting , i will test this with node-mic, also it can be a issue in whisper cpp

YozoraWolf commented 4 months ago

Ahh, interesting , i will test this with node-mic, also it can be a issue in whisper cpp

Indeed, script works just fine, issue is when transcribing so it could as well be whisper cpp too.

Do let me know what you can find! :)