Doubt : format of the output

amsehili / auditok

An audio/acoustic activity detection and audio segmentation tool

MIT License

745 stars 96 forks source link

Doubt : format of the output #42

Closed Pked01 closed 2 years ago

Pked01 commented 2 years ago

Hello This is not an issue but an doubt I am using auditok for audio tokenization, but i need data in librosa/soundfile format when i check librosa/sf the data values are in floating points while in auditok they are large numbers But in both documentation its mentioned that output is timeseries. Can you please help me convert one output format to other, or at the least explain what is the format of the output in auditok

amsehili commented 2 years ago

Hello, Librosa converts audio data to a numpy array of floats. In auditok you can use region.samples to get audio data as a numpy array (if numpy is installed) or as a standard python array (if numpy is not installed).

Pked01 commented 2 years ago

Hello, Librosa converts audio data to a numpy array of floats. In auditok you can use region.samples to get audio data as a numpy array (if numpy is installed) or as a standard python array (if numpy is not installed).

actually i tested it

when loaded same file give floating point numbers in librosa whole auditok give 2 separate arrays of channels and sample ..Is there a way to combine output of auditok and get similar array as librosa

amsehili commented 2 years ago

Hello,

librosa.load has an argument called mono which is True by default, meaning that librosa will compute the average of available audio channels and return the result as a 1D array. You can have similar result with auditok like this:

audio = auditok.load("file.wav")
if audio.channels > 1:
    data = np.mean(audio.samples, axis=0)
else:
    data = audio.samples

Pked01 commented 2 years ago

hey thanks for responding back.. this is working for me.. only thing there's slight change in your code...

audio = auditok.load("file.wav")
if audio.channels > 1:
    data = np.mean(audio.samples, axis=0)/32767
else:
    data = audio.samples/32767