audeering / audinterface

Generic interfaces for signal processing
https://audeering.github.io/audinterface/
Other
5 stars 0 forks source link

process_signal() and process_file() can return different results #123

Closed hagenw closed 1 year ago

hagenw commented 1 year ago

First we get some data and define an interface:

import audb
import audinterface
import audiofile

files = audb.load_media('emodb', 'wav/13b09La.wav', version='1.4.1')
signal, sampling_rate = audiofile.read(files[0])
interface = audinterface.Process(process_func=lambda x, sr: x.sum())

Then we select a part of the audio for which we get the same result:

import pandas as pd

start = pd.Timedelta('0 days 00:00:00.120000')
end = pd.Timedelta('0 days 00:00:02.440000')

then we get

>>> interface.process_file(files[0], start=start, end=end).tolist()
[-21.31195068359375]

>>> interface.process_signal(signal, sampling_rate, start=start, end=end).tolist()
[-21.31195068359375]

Then we use end and start values from https://github.com/audeering/audinterface/issues/122 for which we got different results there:

start = pd.Timedelta('0 days 00:00:01.140000')
end = pd.Timedelta('0 days 00:00:01.560000')

then we get

>>> interface.process_file(files[0], start=start, end=end).tolist()
[1.95721435546875]

>>>  interface.process_signal(signal, sampling_rate, start=start, end=end).tolist()
[2.218841552734375]
hagenw commented 1 year ago

As a next step we take a look what causes the difference in result by using the identity function as process_func.

import audinterface
import audiofile
import numpy as np
import pandas as pd

np.random.seed(1)
sampling_rate = 16000
signal = np.random.normal(loc=0.0, scale=0.3, size=(1, 3 * sampling_rate))
audiofile.write('test.wav', signal, sampling_rate)
signal, sampling_rate = audiofile.read('test.wav')
interface = audinterface.Process()

Again, for the first selected pair of start and end we get the same result:

>>> start = pd.Timedelta('0 days 00:00:00.120000')

>>> end = pd.Timedelta('0 days 00:00:02.440000')

>>> interface.process_file('test.wav', start=start, end=end)[0]
array([[-0.28231812, -0.16455078,  0.22692871, ...,  0.10430908,
         0.10797119,  0.09988403]], dtype=float32)

>>> interface.process_signal(signal, sampling_rate, start=start, end=end)[0]
array([[-0.28231812, -0.16455078,  0.22692871, ...,  0.10430908,
         0.10797119,  0.09988403]], dtype=float32)

When switching to the second pair of start and end, we see that the result is shifted by one sample between process_file() (shifted one sample to the right) and process_signal():

>>> start = pd.Timedelta('0 days 00:00:01.140000')

>>> end = pd.Timedelta('0 days 00:00:01.560000')

>>> interface.process_file('test.wav', start=start, end=end)[0]
array([[-0.05056763,  0.48214722,  0.25619507, ...,  0.9999695 ,
        -0.26235962,  0.27453613]], dtype=float32)

>>> interface.process_signal(signal, sampling_rate, start=start, end=end)[0]
array([[-0.03106689, -0.05056763,  0.48214722, ...,  0.02084351,
         0.9999695 , -0.26235962]], dtype=float32)