Need help with the inputs

tamnguyenvan commented 2 years ago

Hi, I have a dumb question. My model receives outputs of librosa.load(audio_file, sr=16000) as inputs. How can I reproduce it with your code?

Thank you.

Caldarie commented 2 years ago

Hmm, am I correct to assume that you want to upload an audio file and return an array of 16 bit values? If yes, you may need to edit the package to do so. For example on android, the code below returns spliced arrays of 16 bit values, and is then fed into to the model. If you just want the array, just edit the code this::startRecognition to your own function.

    public void preprocess(byte[] byteData) {
        Log.d(LOG_TAG, "Preprocessing audio file..");

        audioFile = new AudioFile(byteData, audioLength);
        audioFile.getObservable()
                .doOnComplete(() -> {
                    stopStream(); 
                    clearPreprocessing();
                })
                .subscribe(this::startRecognition);  //EDIT THIS CODE HERE
        audioFile.splice();
    }

If you want the package to take care of the recognition as well, all you need to do is evoke the code below from the TfliteAudio package. This should have the same effect as librosa.load(audio_file, sr=16000)

recognitionStream = TfliteAudio.startFileRecognition(
  sampleRate: 44100,
  audioDirectory: "assets/sampleAudio.wav",
  );

SanaSizmic commented 1 year ago

Hi @Caldarie, signal, sample_rate = librosa.load(file_path)

My model receives the signal of fixed 1sec duration and sample_rate 22050 as input, I tested locally in python it gives the correct output, but when I try in flutter using flutter_tflite_audio it's giving incorrect output.

Can you please guide me on where & what should i change in the above "::startRecognition" code.

Thanks,

Caldarie commented 1 year ago

Hi @SanaSizmic

Am I correct to assume that you want to load the audio file, and then output an array of float values?

in that case, you can simply modify the code below to:

subscribe(value -> print(value));

you might wanna double check on the syntax. Been awhile since I’ve touched Java.

SanaSizmic commented 1 year ago

Hi @Caldarie ,

When I tested locally using python, my model gives me [0.07594858, 0.9240514 ] predicted output which is the correct prediction, and for the same audio.wav file when I tested using flutter_tflite_audio it's giving [0.27258825, 0.72741175] output which is incorrect. Can you please suggest what should i change in flutter_tflite_audio package.

Thanks,

Caldarie commented 1 year ago

Hi @SanaSizmic

I see what you mean. I suspect that the float values are distorted from extraction, or the audio files are not spliced correctly.

If possible, can you compare the values from librosa.load and subscribe(value -> print(value));? And tell me whether they are similar to each other?

SanaSizmic commented 1 year ago

Hi @Caldarie ,

No, it's not similar to each other. when I print it, it gives sets of different arrays and every array generates a different output, I also think the same that the audio files are not spliced correctly.

Instead of splicing the audio file can I give the whole file to the model? So can you please guide me on how can I fix this.

Thanks,

Caldarie commented 1 year ago

Instead of splicing the audio file can I give the whole file to the model? So can you please guide me on how can I fix this

That really depends on your model. If the audio file has correct number of samples per second, then there’s no need to splice it.

No, it's not similar to each other. when I print it, it gives sets of different arrays and every array generates a different output, I also think the same that the audio files are not spliced correctly.

Take a look at the following code. You can test it to find errors.

From tfliteAudioPlugim.java, extraction data starts from here:

private byte[] extractRawData(AssetFileDescriptor fileDescriptor, long startOffset, long declaredLength) {
        Log.d(LOG_TAG, "Extracting byte data from audio file");

        MediaDecoder decoder = new MediaDecoder(fileDescriptor, startOffset, declaredLength);
        AudioProcessing audioData = new AudioProcessing();

        byte[] byteData = {};
        byte[] readData;
        while ((readData = decoder.readByteData()) != null) {
            byteData = audioData.appendByteData(readData, byteData);
            Log.d(LOG_TAG, "data chunk length: " + readData.length);
        }
        Log.d(LOG_TAG, "byte data length: " + byteData.length);
        return byteData;

    }

From AudioFile.java , conversion from byte to short starts here:

shortBuffer = ByteBuffer.wrap(byteData).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer();

For splicing, take a look at the code below. I have actually written some unit tests found here to test this algorithm. You are free to check it yourself for any problems

    public void splice() {
        isSplicing = true;

        for (int i = 0; i < shortBuffer.limit(); i++) {

            short dataPoint = shortBuffer.get(i);

            if (!isSplicing) {
                subject.onComplete();
                break;
            }

            switch (audioData.getState(i)) {
                case "append":
                    audioData
                        .append(dataPoint);
                break;
                case "recognise":
                    Log.d(LOG_TAG, "Recognising");
                    audioData
                        .append(dataPoint)
                        .displayInference()
                        .emit(data -> subject.onNext(data))
                        .reset();
                    break;
                case "finalise":
                    Log.d(LOG_TAG, "Finalising");
                    audioData
                        .append(dataPoint)
                        .displayInference()
                        .emit(data -> subject.onNext(data));
                    stop();
                    break;
                case "padAndFinalise":
                    Log.d(LOG_TAG, "Padding and finalising");
                    audioData
                        .append(dataPoint)
                        .padSilence(i)
                        .displayInference()
                        .emit(data -> subject.onNext(data));
                    stop();
                    break;

                default:
                    throw new AssertionError("Incorrect state when preprocessing");
            }
        }
    }

SanaSizmic commented 1 year ago

Hi @Caldarie,

SAMPLES_TO_CONSIDER = 22050
signal, sample_rate = librosa.load(file_path)

if len(signal) >= SAMPLES_TO_CONSIDER:
    # ensure consistency of the length of the signal
    signal = signal[:SAMPLES_TO_CONSIDER]

else:
    signal = fix_length(signal, size=int(1*sample_rate), mode='edge')

# predictions = self.model.predict(signal)

Can I do this using flutter_tflite_audio read the raw audio data convert it to the fixed sample rate length and predict. Thanks,

Caldarie commented 1 year ago

@SanaSizmic sorry for the late reply.

Yeah, you can absolutely follow something similar by editing the code in this plugin

SanaSizmic commented 1 year ago

Hi @Caldarie, Can you please explain to me how the plugin works now? the structure, like First it takes the raw input signal array then splices to what length? Or in order to do that (which I shared in the above code) which files do I have to edit, If you can guide me that will be highly appreciated. Thanks

Caldarie commented 1 year ago

AS mentioned above, all you need to do is tweak the code provided below. The value returns an array of samples which you can use to implement the code you have provided

subscribe(value -> print(value));

SanaSizmic commented 1 year ago

Hi, I have a dumb question. My model receives outputs of librosa.load(audio_file, sr=16000) as inputs. How can I reproduce it with your code?

Thank you.

Hi @tamnguyenvan, Did you manage to figure out this?

SanaSizmic commented 1 year ago

Hi @Caldarie,

AS mentioned above, all you need to do is tweak the code provided below. The value returns an array of samples which you can use to implement the code you have provided

subscribe(value -> print(value));

do you mean this code in TfliteAudioPlugin.java file

public void preprocess(byte[] byteData) {
        Log.d(LOG_TAG, "Preprocessing audio file..");

        audioFile = new AudioFile(byteData, audioLength);
        audioFile.getObservable()
                .doOnComplete(() -> {
                    stopStream(); 
                    clearPreprocessing();
                })
                .subscribe(this::startRecognition);
        audioFile.splice();
    }

This output.txt is my output file can you please check and let me know if anything is missing?

Caldarie commented 1 year ago

Hmm, everything seems to be in order. The question is whether it’s producing accurate results?

Caldarie / flutter_tflite_audio

Need help with the inputs #35