Audio format of alize - Githubissues

ALIZE-Speaker-Recognition / android-alize

ALIZE for the Android platform.

GNU Lesser General Public License v3.0

35 stars 17 forks source link

Audio format of alize #2

Closed viju2008 closed 6 years ago

viju2008 commented 6 years ago

Please tell what is the audio format accepted by alize Little endian or big endian 16 bit PCM SIGNED OR UNSIGNED 8000HZ OR 16000HZ

In alize android will the wav filed converted to byte array should have singed or unsigned bytes

Is theere any sample code to covnvert wav to byte array. i do have some sample code to convert wav to bytes i will check and contribute it they work.

But please help me out with about the specs

ra2637 commented 6 years ago

I am thinking that can we just transfer the wav audio to byte array like this?

AssetFileDescriptor audio = getApplicationContext().getAssets().openFd("test.WAV");
alizeSystem.addAudio(audio.createInputStream());
alizeSystem.createSpeakerModel("test");

This is just an experiment, I put the test.wav in assets. Since I am stuck in createSpeakerModel, I cannot tell if this is correct.

ra2637 commented 6 years ago

I also have a question about the spec. I got the error msg as following:

I/System.out: Creating speaker model...
W/System.err: AlizeSpkRec.AlizeException: [ InvalidDataException 0x73732f4c80 ]
W/System.err:   message   = "Wrong number of data"
W/System.err:   source file = /Wokspace/alize/android-alize/alize/src/main/cpp/alize-core/src/FeatureFileReaderRaw.cpp
W/System.err:   line number = 98
W/System.err:   fileName =  /data/user/0/com.example.android.voicecamera/files/data/prm/171111_230045.prm
W/System.err:     at AlizeSpkRec.SimpleSpkDetSystem.createSpeakerModel(Native Method)

Can you help to let us know what kind of data should we provide? Thanks

tevamerlin commented 6 years ago

Hi, The default audio format when using the class SimpleSpkDetSystem is linear PCM represented as 16-bit, signed integers. The frequency is specified in the configuration file, using SPRO_sampleRate. The endianness is assumed to be the native endianness of the current platform, but can be inverted by adding the parameter SPRO_lswap to the configuration file. The signal is assumed to be monophonic.

The README has been updated with this information, and a method has been added to make it easier to pass audio data when you already have it as an array of short.

tevamerlin commented 6 years ago

@ra2637 Yes, audio can be passed this way, using an InputStream. It is the easiest way for audio files provided in the assets. Just make sure the file contains only raw data at the right format (including endianness), with no header. Otherwise, you may try SPro’s support for standard Wave files, by using the right setting for SPRO_format in your configuration file.

However, for audio recorded from the microphone, you are more likely to pass it to the system using addAudio(short[] linearPCMSamples).