bastibe / python-soundfile

SoundFile is an audio library based on libsndfile, CFFI, and NumPy
BSD 3-Clause "New" or "Revised" License
676 stars 105 forks source link

Soundfile read/write wav is not symmetric with default arguments #413

Open jon-petter opened 8 months ago

jon-petter commented 8 months ago

I came across some unexpected behavior in soundfile (version 0.12.1) read/write.

If you have the following float array:

import numpy as np
data = np.array([0, -1, 0, 1, 0, -1, 0, 1, 0, -1, 0], dtype=np.float32)
print((data*(2**15 - 1)))

[     0. -32767.      0.  32767.      0. -32767.      0.  32767.      0.      -32767.      0.]

If you now write and read this data to a wav file using soundfile write and read (with default arguments), you get:

import soundfile as sf
sf.write('test.wav', data, 44100)
data, sample_rate = sf.read('test.wav')
print((data*(2**15 - 1)))

[     0.         -32767.              0.          32766.00003052
      0.         -32767.              0.          32766.00003052
      0.         -32767.              0.        ]

So my pure, max amplitude sin wave has now been reduced in amplitude, and a tiny DC offset has been introduced.

I understand that, when writing to PCM16, there would be quantization artifacts, but I was not expecting the positive and negative sides of the signal to be scaled differently (to this extent).

Is this scaling applied in soundfile code, or in some of the libs it builds upon?

My main question is why this asymmetric scaling is not reversed when using soundfile.read() with dtype="float64"?

bastibe commented 8 months ago

This is the unfortunate reality of integer numbers. The lowest possible 16-bit number is -2^15, but the highest possible is 2^15-1. When dealing with float inputs, you have to apply some scaling, and there is no correct answer.

There's no right answer. But in reality, the differences between these is imperceptible.

Soundfile does not implement this, but merely passes the data on to libsndfile, which does the transformation.

If you need a perfect float representation, you could always use a native float format, such as MAT5, or (IIRC) Flac or WAV with the FLOAT subtype.

jon-petter commented 8 months ago

Yes. I understand that. I was mostly wondering why the scaling is different on write and read, but it is a problem with libsndfile then?

At least, this is the behavior I observe:

Anyhow, I understand that I'm complaining about a 1/2**15 max quantization error vs a1/2**16 max error, and these differences, as you say, are probably imperceptible.

bastibe commented 8 months ago

The problem is not that read and write are different, but that +1 is not representable. If you use values <1, it should be symmetric.