GiviMAD / whisper-jni

A JNI wrapper for using whisper.cpp, allows to transcribe speech to text in Java.
Apache License 2.0
81 stars 12 forks source link

whisper.init() - UnsatisfiedLinkError #8

Closed LosokosG closed 11 months ago

LosokosG commented 11 months ago

For some reason when i try create a context, the whole thread stops with:

Exception in thread "main" java.lang.UnsatisfiedLinkError: 'int io.github.givimad.whisperjni.WhisperJNI.init(java.lang.String)'
    at io.github.givimad.whisperjni.WhisperJNI.init(Native Method)
    at io.github.givimad.whisperjni.WhisperJNI.init(WhisperJNI.java:51)
    at org.echoai.Main.main(Main.java:49) 

Here is my current code:

public class Main {
    public static WhisperContext whisperContext;
    public static File tempModel;
    public static WhisperJNI whisper;
    public static  WhisperFullParams whisperFullParams;
    public static void main(String[] args) throws InterruptedException {
        loadWhisperLibrary();
        whisper = new WhisperJNI();

        try {
            // Load the model as a stream
            InputStream modelStream = Main.class.getClassLoader().getResourceAsStream("ggml-base.en.bin");
            if (modelStream == null) {
                throw new IllegalArgumentException("Model file not found!");
            }
            // Create a temporary file to copy the model into
            tempModel = File.createTempFile("whisper-model", ".bin");
            tempModel.deleteOnExit(); // Ensure the file is deleted when the program exits
            try (OutputStream out = new FileOutputStream(tempModel)) {
                // Copy the model from the resource stream to the temporary file
                byte[] buffer = new byte[1024];
                int bytesRead;
                while ((bytesRead = modelStream.read(buffer)) != -1) {
                    out.write(buffer, 0, bytesRead);
                }
            }
            log.info(tempModel.getAbsolutePath());
            whisperContext = whisper.init(Path.of(tempModel.getAbsolutePath()));
            whisperFullParams = new WhisperFullParams();

        } catch (IOException e) {
            throw new RuntimeException("Failed to load Whisper model", e);
        }

        }

This would mean that JNI is unable to find the native method implementations in the .dll file. But i do not see any issue with it. I have the .dll and my .jar from here: https://github.com/ggerganov/whisper.cpp/actions/runs/6703965523 Specifically: win32-x86-64_whisper.dll & whispercpp.jar

The dll is placed here: src/main/resources/whisper.dll and the . jar files are here: C:\Users\Losokos\IdeaProjects\EchoAI\libs\whispercpp-1.4.0.jar C:\Users\Losokos\IdeaProjects\EchoAI\libs\whispercpp-1.4.0-javadoc.jar C:\Users\Losokos\IdeaProjects\EchoAI\libs\whispercpp-1.4.0-sources.jar

I have used this method to load the whisper.dll

public static void loadWhisperLibrary() {
        try {
            // Get the .dll as an input stream
            InputStream in = LibraryLoader.class.getResourceAsStream("/whisper.dll");
            if (in == null) {
                throw new FileNotFoundException("whisper.dll not found in resources.");
            }

            // Create a temporary file to copy the .dll
            File tempDll = File.createTempFile("whisper", ".dll");

            // Ensure the file is deleted on exit
            tempDll.deleteOnExit();

            // Copy the .dll to the temporary file
            try (OutputStream out = new FileOutputStream(tempDll)) {
                byte[] buffer = new byte[1024];
                int bytesRead;
                while ((bytesRead = in.read(buffer)) != -1) {
                    out.write(buffer, 0, bytesRead);
                }
            }

            // Load the .dll from the temporary file
            System.load(tempDll.getAbsolutePath());
            log.info(".dll loaded {}", tempDll.exists());
        } catch (IOException e) {
            throw new RuntimeException("Failed to load the native library", e);
        }
    }

and included the .jar's in gradle implementation fileTree(dir: 'libs', include: ['*.jar'])

I have also added the .jar's in the project structure -> Libraries in IntelliJ IDEA

Please help me out, i am literally dying to finally get this working. ThanQ!

GiviMAD commented 11 months ago

Hello,

In case you are mixing things, Whisper.cpp has its how java wrapper in its repo, which didn't worked for me so I built these one which includes prebuilt binaries to accomplish basic execution in some platforms. That ones are already packaged in the maven distribution.

You can't use the whisper.dll directly with this project, because it uses a single binary file which should export a compatible JNI interface for the class WhisperJNI. You can find how I built it for windows in the build_win.ps1 file.

I'm currently working on a PR that changes the way the library is loaded so you can use a different whisper shared library with this project, but I still have to add the windows part, so not ready.

GiviMAD commented 11 months ago

To use it on windows x64 it should work if you install it from maven https://central.sonatype.com/artifact/io.github.givimad/whisper-jni and use the example code.

Please reopen the issue if you found out that's not the case. Best regards.

LosokosG commented 11 months ago

To use it on windows x64 it should work if you install it from maven https://central.sonatype.com/artifact/io.github.givimad/whisper-jni and use the example code.

Please reopen the issue if you found out that's not the case. Best regards.

( i can't reopen the issue btw ) Okay, so first of all thanks for the reply! 2nd, i forgot to mention that i do already have the maven dependency included. Also do i understand correctly that i should not use the whispercpp jar / whisper.dll at all??

Aaand also i am running the test with your example code now, using my own .wav file, and the output is weird, it transcribed the audio as "[sound of fire]" ( which is obviously not true ), but also seemingly did it before the model has even loaded.

Here is the output:

4:59:36.027 [Test worker] INFO   - Running example test.
14:59:38.215 [Test worker] INFO   - C:\Users\Losokos\AppData\Local\Temp\whisper-model13474856047131154935.bin
15:01:04.351 [Test worker] INFO   - Segment text:  [sound of a fire]
> Task :test
whisper_init_from_file_no_state: loading model from 'C:\Users\Losokos\AppData\Local\Temp\whisper-model13474856047131154935.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2
whisper_model_load: mem required  =  310.00 MB (+    6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.66 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB
whisper_full_with_state: progress =   5%
whisper_full_with_state: progress =  10%
whisper_full_with_state: progress =  15%
whisper_full_with_state: progress =  20%
whisper_full_with_state: progress =  25%
whisper_full_with_state: progress =  30%
whisper_full_with_state: progress =  35%
whisper_full_with_state: progress =  40%
whisper_full_with_state: progress =  45%
whisper_full_with_state: progress =  50%
whisper_full_with_state: progress =  55%
whisper_full_with_state: progress =  60%
whisper_full_with_state: progress =  65%
whisper_full_with_state: progress =  70%
whisper_full_with_state: progress =  75%
whisper_full_with_state: progress =  80%
whisper_full_with_state: progress =  85%
whisper_full_with_state: progress =  90%
whisper_full_with_state: progress =  95%
BUILD SUCCESSFUL in 1m 41s

If you could provide the file you are using for your test so i can actually run the test properly, that would be great!

Here is the code:

    public void ExampleTest() throws IOException {

        log.info("Running example test.");

        // Load the .wav file
        try (AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(new File("src/harvard.wav"))) {
            AudioFormat format = audioInputStream.getFormat();
            long audioFileLength = audioInputStream.getFrameLength();
            int frameSize = format.getFrameSize();

            // Calculate the length in bytes of the audio data
            long dataLength = audioFileLength * frameSize;

            byte[] audioBytes = new byte[(int) dataLength];
            audioInputStream.read(audioBytes);

            // Convert the byte array to floats for Whisper AI
            float[] samples = convertAudioBytesToFloats(audioBytes);

            // Load the model as a stream
            InputStream modelStream = Main.class.getClassLoader().getResourceAsStream("ggml-base.en.bin");
            if (modelStream == null) {
                throw new IllegalArgumentException("Model file not found!");
            }
            // Create a temporary file to copy the model into
            tempModel = File.createTempFile("whisper-model", ".bin");
            tempModel.deleteOnExit(); // Ensure the file is deleted when the program exits
            try (OutputStream out = new FileOutputStream(tempModel)) {
                // Copy the model from the resource stream to the temporary file
                byte[] buffer = new byte[1024];
                int bytesRead;
                while ((bytesRead = modelStream.read(buffer)) != -1) {
                    out.write(buffer, 0, bytesRead);
                }
            }
            log.info(tempModel.getAbsolutePath());

            var whisper = new WhisperJNI();
            WhisperJNI.loadLibrary();
            var ctx = whisper.init(Path.of(tempModel.getAbsolutePath()));
            var params = new WhisperFullParams();
            int result = whisper.full(ctx, params, samples, samples.length);
            if (result != 0) {
                throw new RuntimeException("Transcription failed with code " + result);
            }
            int numSegments = whisper.fullNSegments(ctx);

           // assertEquals(1, numSegments);
            String text = whisper.fullGetSegmentText(ctx, 0);
            log.info("Segment text: {}", text);
            //assertEquals(" And so my fellow Americans ask not what your country can do for you ask what you can do for your country.", text);
            ctx.close();
        } catch (UnsupportedAudioFileException e) {
            throw new RuntimeException(e);
        }

    }
GiviMAD commented 11 months ago

Okay, so first of all thanks for the reply!

You are welcome.

2nd, i forgot to mention that i do already have the maven dependency included. Also do i understand correctly that i should not use the whispercpp jar / whisper.dll at all??

No, because the shipped binary already contains the whisper.cpp code. Right now from tag 1.4.2.

Aaand also i am running the test with your example code now, using my own .wav file, and the output is weird, it transcribed the audio as "[sound of fire]" ( which is obviously not true ), but also seemingly did it before the model has even loaded.

The test is only ready to process the wav format of the example file (16000hz, signed int, mono), maybe that is the problem. You can verify the results against the whisper.cpp tag 1.4.2 using same model.

If you could provide the file you are using for your test so i can actually run the test properly, that would be great!

The file used on the test can be found on whisper.cpp repo inside the samples folder. The whisper.cpp repo is installed as submodule at these one.

LosokosG commented 11 months ago

I've tried running the model on my harvard.wav and i guess you were right:

read_wav: WAV file 'samples/harvard.wav' must be 16 kHz
error: failed to read WAV file 'samples/harvard.wav'

I have found my main problem... I've been using BIG_EDIAN byte order in the convertAudioBytesToFloats() method instead of LITTLE_EDIAN, and that caused the output to be corrupted. Thanks for the help, all works great now!