Whisper speech to text - Githubissues

nalbion commented 10 months ago

I've implemented support for a various Whisper implementations - https://github.com/OpenASR/idiolect/tree/feature/navigation-with-whisper/src/main/java/org/openasr/idiolect/asr/whisper.

whisper-server

The JNA bindings for Whisper.cpp are nearly ready - https://github.com/ggerganov/whisper.cpp/issues/1246.

Alternatively there is also a JNI wrapper https://github.com/GiviMAD/whisper-jni

nalbion commented 10 months ago

@breandan are you able to get any of these working on Mac?

breandan commented 10 months ago

I was able to successfully run the Whisper.cpp stream demo, but am unable to build the JAR on my machine. At first I encountered Execution failed for task ':javadoc' (full stacktrace I received here and here are the contents of the file javadoc.options after running ./gradlew build). I then ran ./gradlew build -x javadoc to skip the Javadoc task, then got java.lang.UnsatisfiedLinkError: Unable to load library 'whisper', so I copied the file whisper.cpp/libwhisper.dylib to whisper.cpp/bindings/java/libwhisper.dylib. Then I got the error:

whisper_init_from_file_no_state: loading model from '../../models/ggml-tiny.en.bin'
whisper_init_from_file_no_state: failed to open '../../models/ggml-tiny.en.bin'

so then I tried building tiny.en using the same instructions from base.en but got another error:

I whisper.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_DARWIN_C_SOURCE -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_DARWIN_C_SOURCE -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)

bash ./models/download-ggml-model.sh tiny.en
Downloading ggml model tiny.en from 'https://huggingface.co/ggerganov/whisper.cpp' ...
Model tiny.en already exists. Skipping download.

===============================================
Running tiny.en on all samples in ./samples ...
===============================================

----------------------------------------------
[+] Running tiny.en on samples/jfk.wav ... (run 'ffplay samples/jfk.wav' to listen)
----------------------------------------------

whisper_init_from_file_no_state: loading model from 'models/ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1
whisper_model_load: mem required  =  201.00 MB (+    3.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =   73.62 MB
whisper_model_load: model size    =   73.54 MB
whisper_init_state: kv self size  =    2.62 MB
whisper_init_state: kv cross size =    8.79 MB
whisper_init_state: loading Core ML model from 'models/ggml-tiny.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: failed to load Core ML model from 'models/ggml-tiny.en-encoder.mlmodelc'
error: failed to initialize whisper context

So instead of using tiny.en, changed this line to String modelName = "../../models/ggml-base.en.bin"; and finally tried to build the JAR via ./gradlew build -x javadoc, but encountered the following error:

Starting a Gradle Daemon, 2 incompatible Daemons could not be reused, use --status for details

> Task :test
whisper_init_from_file_no_state: loading model from '../../models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2
whisper_model_load: mem required  =  310.00 MB (+    6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.66 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB
whisper_init_state: loading Core ML model from '../../models/ggml-base.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000018393be64, pid=36696, tid=8451
#
# JRE version: OpenJDK Runtime Environment JBR-17.0.1.12-164.8-jcef (17.0.1+12) (build 17.0.1+12-b164.8)
# Java VM: OpenJDK 64-Bit Server VM JBR-17.0.1.12-164.8-jcef (17.0.1+12-b164.8, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64)
# Problematic frame:
# C  [libsystem_platform.dylib+0xe64]  _platform_strlen+0x4
#
# Core dump will be written. Default location: /cores/core.36696
#
# An error report file with more information is saved as:
# /Users/breandan/IdeaProjects/whisper.cpp/bindings/java/hs_err_pid36696.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

> Task :test FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':test'.
> Process 'Gradle Test Executor 1' finished with non-zero exit value 134
  This problem might be caused by incorrect test process configuration.
  Please refer to the test execution section in the User Manual at https://docs.gradle.org/8.1/userguide/java_testing.html#sec:test_execution

* Try:
> Run with --stacktrace option to get the stack trace.
> Run with --info or --debug option to get more log output.
> Run with --scan to get full insights.

* Get more help at https://help.gradle.org

BUILD FAILED in 11s
6 actionable tasks: 5 executed, 1 up-to-date

Here are the contents of the file hs_err_pid36696.log. Possibly related to ggerganov/whisper.cpp#963.

nalbion commented 10 months ago

@breandan I've finally got an official whisper.cpp deployed to Maven Central

OpenASR / idiolect

Whisper speech to text #71