nodejs-whisper

Node.js bindings for OpenAI's Whisper model.

Features

Automatically convert the audio to WAV format with a 16000 Hz frequency to support the whisper model.
Output transcripts to (.txt .srt .vtt .json .wts .lrc)
Optimized for CPU (Including Apple Silicon ARM)
Timestamp precision to single word
Split on word rather than on token (Optional)
Translate from source language to english (Optional)
Convert audio format to wav to support whisper model

Installation

Install make tools

sudo apt update
sudo apt install build-essential

Install nodejs-whisper with npm

  npm i nodejs-whisper

Download whisper model

  npx nodejs-whisper download

NOTE: user may need to install make tool

Usage/Examples

import path from 'path'
import { nodewhisper } from 'nodejs-whisper'

// Need to provide exact path to your audio file.
const filePath = path.resolve(__dirname, 'YourAudioFileName')

await nodewhisper(filePath, {
    modelName: 'base.en', //Downloaded models name
    autoDownloadModelName: 'base.en', // (optional) autodownload a model if model is not present
    verbose: false, // (optional) output more dubugging information
    removeWavFileAfterTranscription: false, // (optional) remove wav file once transcribed
    withCuda: false // (optional) use cuda for faster processing
    whisperOptions: {
        outputInCsv: false, // get output result in csv file
        outputInJson: false, // get output result in json file
        outputInJsonFull: false, // get output result in json file including more information
        outputInLrc: false, // get output result in lrc file
        outputInSrt: true, // get output result in srt file
        outputInText: false, // get output result in txt file
        outputInVtt: false, // get output result in vtt file
        outputInWords: false, // get output result in wts file for karaoke
        translateToEnglish: false, // translate from source language to english
        wordTimestamps: false, // word-level timestamps
        timestamps_length: 20, // amount of dialogue per timestamp pair
        splitOnWord: true, // split on word rather than on token
    },
})

// Model list
const MODELS_LIST = [
    'tiny',
    'tiny.en',
    'base',
    'base.en',
    'small',
    'small.en',
    'medium',
    'medium.en',
    'large-v1',
    'large',
]

Types

 interface IOptions {
    modelName: string
    verbose?: boolean
    removeWavFileAfterTranscription?: boolean
    withCuda?: boolean
    autoDownloadModelName?: string
    whisperOptions?: WhisperOptions
}

 interface WhisperOptions {
    outputInCsv?: boolean
    outputInJson?: boolean
    outputInJsonFull?: boolean
    outputInLrc?: boolean
    outputInSrt?: boolean
    outputInText?: boolean
    outputInVtt?: boolean
    outputInWords?: boolean
    translateToEnglish?: boolean
    timestamps_length?: number
    wordTimestamps?: boolean
    splitOnWord?: boolean
}

Run locally

Clone the project

  git clone https://github.com/ChetanXpro/nodejs-whisper

Go to the project directory

  cd nodejs-whisper

Install dependencies

  npm install

Start the server

  npm run dev

Build project

  npm run build

Made with

Whisper OpenAI (using C++ port by: ggerganov)

Feedback

If you have any feedback, please reach out to us at chetanbaliyan10@gmail.com

Authors

@chetanXpro

ChetanXpro / nodejs-whisper

readme