Closed LosokosG closed 11 months ago
Hello,
In case you are mixing things, Whisper.cpp has its how java wrapper in its repo, which didn't worked for me so I built these one which includes prebuilt binaries to accomplish basic execution in some platforms. That ones are already packaged in the maven distribution.
You can't use the whisper.dll directly with this project, because it uses a single binary file which should export a compatible JNI interface for the class WhisperJNI. You can find how I built it for windows in the build_win.ps1 file.
I'm currently working on a PR that changes the way the library is loaded so you can use a different whisper shared library with this project, but I still have to add the windows part, so not ready.
To use it on windows x64 it should work if you install it from maven https://central.sonatype.com/artifact/io.github.givimad/whisper-jni and use the example code.
Please reopen the issue if you found out that's not the case. Best regards.
To use it on windows x64 it should work if you install it from maven https://central.sonatype.com/artifact/io.github.givimad/whisper-jni and use the example code.
Please reopen the issue if you found out that's not the case. Best regards.
( i can't reopen the issue btw ) Okay, so first of all thanks for the reply! 2nd, i forgot to mention that i do already have the maven dependency included. Also do i understand correctly that i should not use the whispercpp jar / whisper.dll at all??
Aaand also i am running the test with your example code now, using my own .wav file, and the output is weird, it transcribed the audio as "[sound of fire]" ( which is obviously not true ), but also seemingly did it before the model has even loaded.
Here is the output:
4:59:36.027 [Test worker] INFO - Running example test.
14:59:38.215 [Test worker] INFO - C:\Users\Losokos\AppData\Local\Temp\whisper-model13474856047131154935.bin
15:01:04.351 [Test worker] INFO - Segment text: [sound of a fire]
> Task :test
whisper_init_from_file_no_state: loading model from 'C:\Users\Losokos\AppData\Local\Temp\whisper-model13474856047131154935.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2
whisper_model_load: mem required = 310.00 MB (+ 6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx = 140.66 MB
whisper_model_load: model size = 140.54 MB
whisper_init_state: kv self size = 5.25 MB
whisper_init_state: kv cross size = 17.58 MB
whisper_full_with_state: progress = 5%
whisper_full_with_state: progress = 10%
whisper_full_with_state: progress = 15%
whisper_full_with_state: progress = 20%
whisper_full_with_state: progress = 25%
whisper_full_with_state: progress = 30%
whisper_full_with_state: progress = 35%
whisper_full_with_state: progress = 40%
whisper_full_with_state: progress = 45%
whisper_full_with_state: progress = 50%
whisper_full_with_state: progress = 55%
whisper_full_with_state: progress = 60%
whisper_full_with_state: progress = 65%
whisper_full_with_state: progress = 70%
whisper_full_with_state: progress = 75%
whisper_full_with_state: progress = 80%
whisper_full_with_state: progress = 85%
whisper_full_with_state: progress = 90%
whisper_full_with_state: progress = 95%
BUILD SUCCESSFUL in 1m 41s
If you could provide the file you are using for your test so i can actually run the test properly, that would be great!
Here is the code:
public void ExampleTest() throws IOException {
log.info("Running example test.");
// Load the .wav file
try (AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(new File("src/harvard.wav"))) {
AudioFormat format = audioInputStream.getFormat();
long audioFileLength = audioInputStream.getFrameLength();
int frameSize = format.getFrameSize();
// Calculate the length in bytes of the audio data
long dataLength = audioFileLength * frameSize;
byte[] audioBytes = new byte[(int) dataLength];
audioInputStream.read(audioBytes);
// Convert the byte array to floats for Whisper AI
float[] samples = convertAudioBytesToFloats(audioBytes);
// Load the model as a stream
InputStream modelStream = Main.class.getClassLoader().getResourceAsStream("ggml-base.en.bin");
if (modelStream == null) {
throw new IllegalArgumentException("Model file not found!");
}
// Create a temporary file to copy the model into
tempModel = File.createTempFile("whisper-model", ".bin");
tempModel.deleteOnExit(); // Ensure the file is deleted when the program exits
try (OutputStream out = new FileOutputStream(tempModel)) {
// Copy the model from the resource stream to the temporary file
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = modelStream.read(buffer)) != -1) {
out.write(buffer, 0, bytesRead);
}
}
log.info(tempModel.getAbsolutePath());
var whisper = new WhisperJNI();
WhisperJNI.loadLibrary();
var ctx = whisper.init(Path.of(tempModel.getAbsolutePath()));
var params = new WhisperFullParams();
int result = whisper.full(ctx, params, samples, samples.length);
if (result != 0) {
throw new RuntimeException("Transcription failed with code " + result);
}
int numSegments = whisper.fullNSegments(ctx);
// assertEquals(1, numSegments);
String text = whisper.fullGetSegmentText(ctx, 0);
log.info("Segment text: {}", text);
//assertEquals(" And so my fellow Americans ask not what your country can do for you ask what you can do for your country.", text);
ctx.close();
} catch (UnsupportedAudioFileException e) {
throw new RuntimeException(e);
}
}
Okay, so first of all thanks for the reply!
You are welcome.
2nd, i forgot to mention that i do already have the maven dependency included. Also do i understand correctly that i should not use the whispercpp jar / whisper.dll at all??
No, because the shipped binary already contains the whisper.cpp code. Right now from tag 1.4.2.
Aaand also i am running the test with your example code now, using my own .wav file, and the output is weird, it transcribed the audio as "[sound of fire]" ( which is obviously not true ), but also seemingly did it before the model has even loaded.
The test is only ready to process the wav format of the example file (16000hz, signed int, mono), maybe that is the problem. You can verify the results against the whisper.cpp tag 1.4.2 using same model.
If you could provide the file you are using for your test so i can actually run the test properly, that would be great!
The file used on the test can be found on whisper.cpp repo inside the samples folder. The whisper.cpp repo is installed as submodule at these one.
I've tried running the model on my harvard.wav and i guess you were right:
read_wav: WAV file 'samples/harvard.wav' must be 16 kHz
error: failed to read WAV file 'samples/harvard.wav'
I have found my main problem... I've been using BIG_EDIAN byte order in the convertAudioBytesToFloats() method instead of LITTLE_EDIAN, and that caused the output to be corrupted. Thanks for the help, all works great now!
For some reason when i try create a context, the whole thread stops with:
Here is my current code:
This would mean that JNI is unable to find the native method implementations in the .dll file. But i do not see any issue with it. I have the .dll and my .jar from here: https://github.com/ggerganov/whisper.cpp/actions/runs/6703965523 Specifically: win32-x86-64_whisper.dll & whispercpp.jar
The dll is placed here: src/main/resources/whisper.dll and the . jar files are here: C:\Users\Losokos\IdeaProjects\EchoAI\libs\whispercpp-1.4.0.jar C:\Users\Losokos\IdeaProjects\EchoAI\libs\whispercpp-1.4.0-javadoc.jar C:\Users\Losokos\IdeaProjects\EchoAI\libs\whispercpp-1.4.0-sources.jar
I have used this method to load the whisper.dll
and included the .jar's in gradle
implementation fileTree(dir: 'libs', include: ['*.jar'])
I have also added the .jar's in the project structure -> Libraries in IntelliJ IDEA
Please help me out, i am literally dying to finally get this working. ThanQ!