Open tamnguyenvan opened 2 years ago
Hmm, am I correct to assume that you want to upload an audio file and return an array of 16 bit values? If yes, you may need to edit the package to do so. For example on android, the code below returns spliced arrays of 16 bit values, and is then fed into to the model. If you just want the array, just edit the code this::startRecognition
to your own function.
public void preprocess(byte[] byteData) {
Log.d(LOG_TAG, "Preprocessing audio file..");
audioFile = new AudioFile(byteData, audioLength);
audioFile.getObservable()
.doOnComplete(() -> {
stopStream();
clearPreprocessing();
})
.subscribe(this::startRecognition); //EDIT THIS CODE HERE
audioFile.splice();
}
If you want the package to take care of the recognition as well, all you need to do is evoke the code below from the TfliteAudio package. This should have the same effect as librosa.load(audio_file, sr=16000)
recognitionStream = TfliteAudio.startFileRecognition(
sampleRate: 44100,
audioDirectory: "assets/sampleAudio.wav",
);
Hi @Caldarie,
signal, sample_rate = librosa.load(file_path)
My model receives the signal
of fixed 1sec duration and sample_rate 22050 as input,
I tested locally in python it gives the correct output, but when I try in flutter using flutter_tflite_audio it's giving incorrect output.
Can you please guide me on where & what should i change in the above "::startRecognition" code.
Thanks,
Hi @SanaSizmic
Am I correct to assume that you want to load the audio file, and then output an array of float values?
in that case, you can simply modify the code below to:
subscribe(value -> print(value));
you might wanna double check on the syntax. Been awhile since I’ve touched Java.
Hi @Caldarie ,
When I tested locally using python, my model gives me [0.07594858, 0.9240514 ]
predicted output which is the correct prediction, and for the same audio.wav file when I tested using flutter_tflite_audio it's giving [0.27258825, 0.72741175]
output which is incorrect. Can you please suggest what should i change in flutter_tflite_audio package.
Thanks,
Hi @SanaSizmic
I see what you mean. I suspect that the float values are distorted from extraction, or the audio files are not spliced correctly.
If possible, can you compare the values from librosa.load and subscribe(value -> print(value));? And tell me whether they are similar to each other?
Hi @Caldarie ,
No, it's not similar to each other. when I print it, it gives sets of different arrays and every array generates a different output, I also think the same that the audio files are not spliced correctly.
Instead of splicing the audio file can I give the whole file to the model? So can you please guide me on how can I fix this.
Thanks,
Instead of splicing the audio file can I give the whole file to the model? So can you please guide me on how can I fix this
That really depends on your model. If the audio file has correct number of samples per second, then there’s no need to splice it.
No, it's not similar to each other. when I print it, it gives sets of different arrays and every array generates a different output, I also think the same that the audio files are not spliced correctly.
Take a look at the following code. You can test it to find errors.
From tfliteAudioPlugim.java, extraction data starts from here:
private byte[] extractRawData(AssetFileDescriptor fileDescriptor, long startOffset, long declaredLength) {
Log.d(LOG_TAG, "Extracting byte data from audio file");
MediaDecoder decoder = new MediaDecoder(fileDescriptor, startOffset, declaredLength);
AudioProcessing audioData = new AudioProcessing();
byte[] byteData = {};
byte[] readData;
while ((readData = decoder.readByteData()) != null) {
byteData = audioData.appendByteData(readData, byteData);
Log.d(LOG_TAG, "data chunk length: " + readData.length);
}
Log.d(LOG_TAG, "byte data length: " + byteData.length);
return byteData;
}
From AudioFile.java , conversion from byte to short starts here:
shortBuffer = ByteBuffer.wrap(byteData).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer();
For splicing, take a look at the code below. I have actually written some unit tests found here to test this algorithm. You are free to check it yourself for any problems
public void splice() {
isSplicing = true;
for (int i = 0; i < shortBuffer.limit(); i++) {
short dataPoint = shortBuffer.get(i);
if (!isSplicing) {
subject.onComplete();
break;
}
switch (audioData.getState(i)) {
case "append":
audioData
.append(dataPoint);
break;
case "recognise":
Log.d(LOG_TAG, "Recognising");
audioData
.append(dataPoint)
.displayInference()
.emit(data -> subject.onNext(data))
.reset();
break;
case "finalise":
Log.d(LOG_TAG, "Finalising");
audioData
.append(dataPoint)
.displayInference()
.emit(data -> subject.onNext(data));
stop();
break;
case "padAndFinalise":
Log.d(LOG_TAG, "Padding and finalising");
audioData
.append(dataPoint)
.padSilence(i)
.displayInference()
.emit(data -> subject.onNext(data));
stop();
break;
default:
throw new AssertionError("Incorrect state when preprocessing");
}
}
}
Hi @Caldarie,
SAMPLES_TO_CONSIDER = 22050
signal, sample_rate = librosa.load(file_path)
if len(signal) >= SAMPLES_TO_CONSIDER:
# ensure consistency of the length of the signal
signal = signal[:SAMPLES_TO_CONSIDER]
else:
signal = fix_length(signal, size=int(1*sample_rate), mode='edge')
# predictions = self.model.predict(signal)
Can I do this using flutter_tflite_audio read the raw audio data convert it to the fixed sample rate length and predict. Thanks,
@SanaSizmic sorry for the late reply.
Yeah, you can absolutely follow something similar by editing the code in this plugin
Hi @Caldarie, Can you please explain to me how the plugin works now? the structure, like First it takes the raw input signal array then splices to what length? Or in order to do that (which I shared in the above code) which files do I have to edit, If you can guide me that will be highly appreciated. Thanks
AS mentioned above, all you need to do is tweak the code provided below. The value returns an array of samples which you can use to implement the code you have provided
subscribe(value -> print(value));
Hi, I have a dumb question. My model receives outputs of
librosa.load(audio_file, sr=16000)
as inputs. How can I reproduce it with your code?Thank you.
Hi @tamnguyenvan, Did you manage to figure out this?
Hi @Caldarie,
AS mentioned above, all you need to do is tweak the code provided below. The value returns an array of samples which you can use to implement the code you have provided
subscribe(value -> print(value));
do you mean this code in TfliteAudioPlugin.java file
public void preprocess(byte[] byteData) {
Log.d(LOG_TAG, "Preprocessing audio file..");
audioFile = new AudioFile(byteData, audioLength);
audioFile.getObservable()
.doOnComplete(() -> {
stopStream();
clearPreprocessing();
})
.subscribe(this::startRecognition);
audioFile.splice();
}
This output.txt is my output file can you please check and let me know if anything is missing?
Hmm, everything seems to be in order. The question is whether it’s producing accurate results?
Hi, I have a dumb question. My model receives outputs of
librosa.load(audio_file, sr=16000)
as inputs. How can I reproduce it with your code?Thank you.