Writing integer values through the non-numpy interface results in a wav of all 0s

mcclure commented 9 months ago

Consider this simple use of soundfile:

import soundfile as sf
SAMPLE_RATE = 48000
SHRT_MAX = 32767
with sf.SoundFile("testfile2.wav", mode = 'w', samplerate=SAMPLE_RATE, channels=2, subtype='PCM_16') as outfile:
    frames = []
    for i in range(SAMPLE_RATE):
        frames.append([i%SHRT_MAX, SHRT_MAX-(i%SHRT_MAX)])
    outfile.write(frames)

I expect this to create a wav containing a very slow sawtooth wave (it will not be audible but the waveform will be visible in an editor such as audacity) with a DC offset. Instead, it creates a wav in which all values are zero.

If I pass the values in as numpy rather than as python lists, it works and creates a usable, correct wav file:

import numpy as np
import soundfile as sf
SAMPLE_RATE = 48000
SHRT_MAX = 32767
with sf.SoundFile("testfile2.wav", mode = 'w', samplerate=SAMPLE_RATE, channels=2, subtype='PCM_16') as outfile:
    frames = []
    for i in range(SAMPLE_RATE):
        frames.append([i%SHRT_MAX, SHRT_MAX-(i%SHRT_MAX)])
    array = np.array(frames, dtype=np.int16)
    outfile.write(array)
    print(array)

It is possible that when using non-numpy input, Soundfile is expecting the values in some format (floats?) other than ints. If so, this is not documented. I assume PCM_16 values because I have specified PCM_16 subtype.

I have a non-numpy app in which I adopted python-soundfile because it had the ability to append to wavs. I understand python-soundfile is mostly intended for use with numpy, but it does advertise a non-numpy interface. Using without numpy, I encountered several inconvences, but this was the biggest problem because I could not resolve it by consulting the documentation. I ultimately had to just add numpy to my program.

bastibe commented 9 months ago

In general, soundfile expects normalized floating point data between -1 and 1. In integer formats this is clearly not possible, so it scales between the appropriate minimum and maximum.

This is documented in soundfile.read, but not mentioned as clearly in soundfile.write. Would you like to contribute an improved documentation as a pull request?

mcclure commented 9 months ago

I am open to contributing a PR, but if I am going to write docs I need to be sure I understand the behavior.

When you say "scales between the appropriate minimum and maximum", you mean it scales from the range -1…1 to the minimum and maximum values of the output sound file's type?

This description surprises me because in my failed test above I was giving values outside the range -1…1 and they were being snapped to 0 rather than scaled or clipped to the -1…1 range. Is there special case handling for values outside -1…1?

bastibe commented 9 months ago

Integer values scale between the minimum and maximum allowable integer value. That's -2^15 ... (2^15 - 1) for int16, and -2^31 ... (2^31 - 1) for int32. Ignore the asymmetry between positive and negative values; IIRC, the scaling factor is actually always symmetrically 2^15 or 2^31, with the odd 2^15 clipped to 2^15-1.

Thus integer-1 scales 1/2^15, which is very close to zero.

But this only applies to numpy arrays with dtype int16 or int32. Python lists are converted to a float numpy array, which must be between -1 and 1.

mgeier commented 9 months ago

What @bastibe mentioned above is mostly correct, but Python lists are not necessarily converted to float arrays.

The code uses np.asarray(), which converts (nested) Python lists to NumPy arrays. If only integers are used, the dtype is chosen to be int64 by default (and we don't specify a dtype explicitly):

>>> import numpy as np
>>> np.asarray([32767]).dtype
dtype('int64')

This means that in your example, you are essentially writing 16-bit data to the least significant bytes of a 64-bit array. Later, this 64-bit array is scaled down to PCM_16, losing all those insignificant bytes and leaving you with only zeros.

For comparison, in your NumPy example, you are writing your data to a 16-bit array, which is then not scaled down, leading to your expected result.

Using without numpy, I encountered several inconvences

I think the misunderstanding is that you think you are using it without NumPy, but as soon as you use the .write() method, you are indirectly using NumPy, even if you don't import NumPy in your own code.

I think this has been clarified already in #407: If you don't want to use NumPy, you should use the .buffer_*() methods.

mgeier commented 9 months ago

When you say "scales between the appropriate minimum and maximum", you mean it scales from the range -1…1 to the minimum and maximum values of the output sound file's type?

To get a clearer idea about the scaling and clipping behavior, you can have a look at the tests:

https://github.com/bastibe/python-soundfile/blob/0a8071d1b04c8e45b2465d05866994ec2df8b1f8/tests/test_soundfile.py#L256-L271

It also happens in the other direction, but that is a very exotic use case:

https://github.com/bastibe/python-soundfile/blob/0a8071d1b04c8e45b2465d05866994ec2df8b1f8/tests/test_soundfile.py#L952-L974

bastibe commented 9 months ago

Thank you for the clarification, @mgeier. I wasn't aware that numpy.ndarray converts an all-int array to int64. That's a surprising gotcha for our use case, actually.

So integer lists must be treated as int64, and scale between -2^63 and 2^63. That's kind of annoying.

mcclure commented 9 months ago

So integer lists must be treated as int64, and scale between -2^63 and 2^63. That's kind of annoying.

Hm, yeah, from a user perspective this is an unusual enough range that if it cannot be changed to something less surprising, it might actually be better for it to error on integer lists.

bastibe commented 9 months ago

I suppose we could work around this issue by forcing any non-numpy-array iterables to float64 in soundfile.py:1019. This would technically be a backwards-incompatible change, though, which I am somewhat weary to make.

Also, this is the first time I have heard of this issue, so I'd assume it is a rare occurrence. Nevertheless, it should be documented at the very least.

mcclure commented 9 months ago

I am going to attempt to send you one or more docs PRs today.

mcclure commented 9 months ago

I need clarification on something before I continue with #410.

The code uses np.asarray(), which converts (nested) Python lists to NumPy arrays. If only integers are used, the dtype is chosen to be int64 by default (and we don't specify a dtype explicitly)

This is not exactly what I see in testing.

Consider this sample program:

import soundfile
s = soundfile.SoundFile("outtest.wav", mode = 'w', samplerate=48000, channels=1, subtype='PCM_16')
s.write([0,1,(1<<16)-1,-(1<<31),(1<<31)-1])
print(s)

Result: Saves a wav file with 4 samples, first 2 are 0, third is minimum intensity, final is maximum intensity

import soundfile
s = soundfile.SoundFile("outtest.wav", mode = 'w', samplerate=48000, channels=1, subtype='PCM_16')
s.write([0,1,(1<<16)-1,-(1<<31),(1<<31)])
print(s)

Result:

Traceback (most recent call last):
  File "C:\Users\Andi\work\g\other\python-soundfile\soundfile.py", line 1328, in _check_dtype
    return _ffi_types[dtype]
           ~~~~~~~~~~^^^^^^^
KeyError: 'int64'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Andi\work\g\other\python-soundfile\64TEST.py", line 3, in <module>
    s.write([0,1,(1<<16)-1,-(1<<31),(1<<31)])
  File "C:\Users\Andi\work\g\other\python-soundfile\soundfile.py", line 1020, in write
    written = self._array_io('write', data, len(data))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Andi\work\g\other\python-soundfile\soundfile.py", line 1342, in _array_io
    ctype = self._check_dtype(array.dtype.name)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Andi\work\g\other\python-soundfile\soundfile.py", line 1330, in _check_dtype
    raise ValueError("dtype must be one of {0!r} and not {1!r}".format(
ValueError: dtype must be one of ['float32', 'float64', 'int16', 'int32'] and not 'int64'

In other words, from this and other tests I conclude the behavior for python int inputs is:

If all values are in the range of 32-bit INT_MIN to INT_MAX inclusive, the values are scaled FROM int32s TO the output format
If at least one value is outside the 32-bit signed integer range, there is an error

This behavior is non-ideal, but it seems to at least be consistent (I did not test on non-win10 systems).

I will change my documentation in #410 to match this behavior, but before I do, does this sound right? Am I missing anything?

mgeier commented 9 months ago

What does this return?

>>> import numpy as np
>>> np.asarray([1]).dtype

mcclure commented 9 months ago

Okay. I have some very bad news. In my Windows 10 venv:

(VENV) C:\Users\Andi\work\g\other\python-soundfile>python
Python 3.11.5 (tags/v3.11.5:cce6ba9, Aug 24 2023, 14:38:34) [MSC v.1936 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.asarray([1]).dtype
dtype('int32')
>>> np.version.version
'1.26.0'

In my WSL (linux VM) venv on the same machine:

(WSLVENV) Andi@Alita:/mnt/c/Users/Andi/work/g/other/python-soundfile$ python3
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.asarray([1]).dtype
dtype('int64')
>>> np.version.version
'1.24.4'

I can't explain this. What does it do on your machine?

If the scale is potentially different (32 or 64 bit) depending on system/configuration, this suggests to me the int form simply should not be used and the documentation should say something like "if the input is a python int array the behavior is undefined".

Alternately, before @bastibe said he didn't want to change the behavior saying

This would technically be a backwards-incompatible change, though, which I am somewhat weary to make.

But, if it was actually broken before (per the tests I pasted above, even if np manages to assign the array an int64, python-soundfile rejects it) maybe you don't need to worry about backward compatibility. If it never worked it seems like logically there must not be existing code to break?

bastibe commented 9 months ago

Oh, this is indeed terrible news. Thank you very much for the analysis!

So essentially, there is no consistent option for handling lists of integers across platforms. We could force any non-numpy type to float, but this would break lists of integers entirely. I don't really see a good way forward beyond documenting that "any non-numpy data will be converted to numpy using np.asarray", and noting that using non-numpy integer data is a very bad idea indeed.

In fact, we should consider issuing a warning whenever np.asarray returns an integer type (and it wasn't already a numpy array beforehand). What are your thoughts on that?

bastibe / python-soundfile

Writing integer values through the non-numpy interface results in a wav of all 0s #405