ChetanXpro / nodejs-whisper

NodeJS Bindings for Whisper - the CPU version of OpenAI's Whisper, as initially crafted in C++ by ggerganov.
https://npmjs.com/nodejs-whisper
MIT License
85 stars 20 forks source link

Model not Found #26

Closed dylan0356 closed 4 months ago

dylan0356 commented 1 year ago

I am on a Mac and trying to use this in a nextjs project

Code ` const filePath = path.join(tempDir, 'out.wav'); console.log(filePath)

    // generate the transcript with whisper
    const transcript = await nodewhisper(filePath, {
        modelName: 'base.en', //Downloaded models name
        autoDownloadModelName: 'base.en', // (optional) autodownload a model if model is not present
        whisperOptions: {
            outputInText: true, // get output result in txt file
            outputInVtt: false, // get output result in vtt file
            outputInSrt: false, // get output result in srt file
            outputInCsv: false, // get output result in csv file
            translateToEnglish: false, //translate from source language to english
            wordTimestamps: false, // Word-level timestamps
            timestamps_length: 20, // amount of dialogue per timestamp pair
            splitOnWord: true, //split on word rather than on token
        },
    }
    );

`

Error:


cd: no such file or directory: /Users/dylanb/Documents/Github/StudyMan/studyapp/.next/server/cpp/whisper.cpp/models
[Nodejs-whisper] Autodownload Model: base

chmod: File not found: /Users/dylanb/Documents/Github/StudyMan/download-ggml-model.sh
node:internal/modules/cjs/loader:1078
  throw err;
  ^

Error: Cannot find module '/Users/dylanb/Documents/Github/StudyMan/studyapp/.next/server/vendor-chunks/exec-child.js'
    at Module._resolveFilename (node:internal/modules/cjs/loader:1075:15)
    at Module._load (node:internal/modules/cjs/loader:920:27)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)
    at node:internal/main/run_main_module:23:47 {
  code: 'MODULE_NOT_FOUND',
  requireStack: []
}

Node.js v18.16.0
[Nodejs-whisper] Attempting to compile model...

node:internal/modules/cjs/loader:1078
  throw err;
  ^

Error: Cannot find module '/Users/dylanb/Documents/Github/StudyMan/studyapp/.next/server/vendor-chunks/exec-child.js'
    at Module._resolveFilename (node:internal/modules/cjs/loader:1075:15)
    at Module._load (node:internal/modules/cjs/loader:920:27)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)
    at node:internal/main/run_main_module:23:47 {
  code: 'MODULE_NOT_FOUND',
  requireStack: []
}

Node.js v18.16.0
[Nodejs-whisper]  Transcribing file: /var/folders/wm/6p8gkm_x6b17rlvy4178hql00000gn/T/out.wav

[Nodejs-whisper] Error: Models do not exist. Please Select a downloaded model.

Error: [Nodejs-whisper] Error: Model not found
    at constructCommand (webpack-internal:///(rsc)/./node_modules/nodejs-whisper/dist/WhisperHelper.js:33:15)
    at eval (webpack-internal:///(rsc)/./node_modules/nodejs-whisper/dist/index.js:53:62)
    at Generator.next (<anonymous>)
    at fulfilled (webpack-internal:///(rsc)/./node_modules/nodejs-whisper/dist/index.js:11:32)

Here is my log of running the model download command

(base) Dylans-MacBook-Air:studyapp dylanb$ npx nodejs-whisper download
[Nodejs-whisper] Models do not exist. Please Select a model to download.

| Model     | Disk   | RAM     |
|-----------|--------|---------|
| tiny      |  75 MB | ~390 MB |
| tiny.en   |  75 MB | ~390 MB |
| base      | 142 MB | ~500 MB |
| base.en   | 142 MB | ~500 MB |
| small     | 466 MB | ~1.0 GB |
| small.en  | 466 MB | ~1.0 GB |
| medium    | 1.5 GB | ~2.6 GB |
| medium.en | 1.5 GB | ~2.6 GB |
| large-v1  | 2.9 GB | ~4.7 GB |
| large     | 2.9 GB | ~4.7 GB |

[Nodejs-whisper] Enter model name (e.g. 'tiny.en') or 'cancel' to exit
(ENTER for tiny.en): base.en
Downloading ggml model base.en from 'https://huggingface.co/ggerganov/whisper.cpp' ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1204  100  1204    0     0   3111      0 --:--:-- --:--:-- --:--:--  3119
100  141M  100  141M    0     0  8405k      0  0:00:17  0:00:17 --:--:-- 9733k
Done! Model 'base.en' saved in 'models/ggml-base.en.bin'
You can now use it like this:

  $ ./main -m models/ggml-base.en.bin -f samples/jfk.wav

[Nodejs-whisper] Attempting to compile model...

sysctl: unknown oid 'hw.optional.arm64'
I whisper.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  i386
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_DARWIN_C_SOURCE -pthread -mf16c -mfma -mavx -mavx2 -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_DARWIN_C_SOURCE -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 12.0.5 (clang-1205.0.22.11)
I CXX:      Apple clang version 12.0.5 (clang-1205.0.22.11)

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_DARWIN_C_SOURCE -pthread -mf16c -mfma -mavx -mavx2 -DGGML_USE_ACCELERATE   -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_DARWIN_C_SOURCE -pthread -c whisper.cpp -o whisper.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_DARWIN_C_SOURCE -pthread examples/main/main.cpp examples/common.cpp examples/common-ggml.cpp ggml.o whisper.o -o main  -framework Accelerate
./main -h

usage: ./main [options] file0.wav file1.wav ...

options:
  -h,        --help              [default] show this help message and exit
  -t N,      --threads N         [4      ] number of threads to use during computation
  -p N,      --processors N      [1      ] number of processors to use during computation
  -ot N,     --offset-t N        [0      ] time offset in milliseconds
  -on N,     --offset-n N        [0      ] segment index offset
  -d  N,     --duration N        [0      ] duration of audio to process in milliseconds
  -mc N,     --max-context N     [-1     ] maximum number of text context tokens to store
  -ml N,     --max-len N         [0      ] maximum segment length in characters
  -sow,      --split-on-word     [false  ] split on word rather than on token
  -bo N,     --best-of N         [2      ] number of best candidates to keep
  -bs N,     --beam-size N       [-1     ] beam size for beam search
  -wt N,     --word-thold N      [0.01   ] word timestamp probability threshold
  -et N,     --entropy-thold N   [2.40   ] entropy threshold for decoder fail
  -lpt N,    --logprob-thold N   [-1.00  ] log probability threshold for decoder fail
  -su,       --speed-up          [false  ] speed up audio by x2 (reduced accuracy)
  -tr,       --translate         [false  ] translate from source language to english
  -di,       --diarize           [false  ] stereo audio diarization
  -tdrz,     --tinydiarize       [false  ] enable tinydiarize (requires a tdrz model)
  -nf,       --no-fallback       [false  ] do not use temperature fallback while decoding
  -otxt,     --output-txt        [false  ] output result in a text file
  -ovtt,     --output-vtt        [false  ] output result in a vtt file
  -osrt,     --output-srt        [false  ] output result in a srt file
  -olrc,     --output-lrc        [false  ] output result in a lrc file
  -owts,     --output-words      [false  ] output script for generating karaoke video
  -fp,       --font-path         [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video
  -ocsv,     --output-csv        [false  ] output result in a CSV file
  -oj,       --output-json       [false  ] output result in a JSON file
  -of FNAME, --output-file FNAME [       ] output file path (without file extension)
  -ps,       --print-special     [false  ] print special tokens
  -pc,       --print-colors      [false  ] print colors
  -pp,       --print-progress    [false  ] print progress
  -nt,       --no-timestamps     [false  ] do not print timestamps
  -l LANG,   --language LANG     [en     ] spoken language ('auto' for auto-detect)
  -dl,       --detect-language   [false  ] exit after automatically detecting language
             --prompt PROMPT     [       ] initial prompt
  -m FNAME,  --model FNAME       [models/ggml-base.en.bin] model path
  -f FNAME,  --file FNAME        [       ] input WAV file path
  -oved D,   --ov-e-device DNAME [CPU    ] the OpenVINO device used for encode inference

c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_DARWIN_C_SOURCE -pthread examples/bench/bench.cpp ggml.o whisper.o -o bench  -framework Accelerate
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_DARWIN_C_SOURCE -pthread examples/quantize/quantize.cpp examples/common.cpp examples/common-ggml.cpp ggml.o whisper.o -o quantize  -framework Accelerate
ChetanXpro commented 11 months ago

need to test this in nextjs , i think this package will not work , might need to do some changes

dylan0356 commented 11 months ago

What is the issue with it running for nextJS