Open mcclure opened 1 year ago
In general, soundfile expects normalized floating point data between -1 and 1. In integer formats this is clearly not possible, so it scales between the appropriate minimum and maximum.
This is documented in soundfile.read
, but not mentioned as clearly in soundfile.write
. Would you like to contribute an improved documentation as a pull request?
I am open to contributing a PR, but if I am going to write docs I need to be sure I understand the behavior.
When you say "scales between the appropriate minimum and maximum", you mean it scales from the range -1…1 to the minimum and maximum values of the output sound file's type?
This description surprises me because in my failed test above I was giving values outside the range -1…1 and they were being snapped to 0 rather than scaled or clipped to the -1…1 range. Is there special case handling for values outside -1…1?
Integer values scale between the minimum and maximum allowable integer value. That's -2^15 ... (2^15 - 1) for int16, and -2^31 ... (2^31 - 1) for int32. Ignore the asymmetry between positive and negative values; IIRC, the scaling factor is actually always symmetrically 2^15 or 2^31, with the odd 2^15 clipped to 2^15-1.
Thus integer-1 scales 1/2^15, which is very close to zero.
But this only applies to numpy arrays with dtype int16 or int32. Python lists are converted to a float numpy array, which must be between -1 and 1.
What @bastibe mentioned above is mostly correct, but Python lists are not necessarily converted to float arrays.
The code uses np.asarray()
, which converts (nested) Python lists to NumPy arrays. If only integers are used, the dtype
is chosen to be int64
by default (and we don't specify a dtype
explicitly):
>>> import numpy as np
>>> np.asarray([32767]).dtype
dtype('int64')
This means that in your example, you are essentially writing 16-bit data to the least significant bytes of a 64-bit array.
Later, this 64-bit array is scaled down to PCM_16
, losing all those insignificant bytes and leaving you with only zeros.
For comparison, in your NumPy example, you are writing your data to a 16-bit array, which is then not scaled down, leading to your expected result.
Using without numpy, I encountered several inconvences
I think the misunderstanding is that you think you are using it without NumPy, but as soon as you use the .write()
method, you are indirectly using NumPy, even if you don't import NumPy in your own code.
I think this has been clarified already in #407: If you don't want to use NumPy, you should use the .buffer_*()
methods.
When you say "scales between the appropriate minimum and maximum", you mean it scales from the range -1…1 to the minimum and maximum values of the output sound file's type?
To get a clearer idea about the scaling and clipping behavior, you can have a look at the tests:
It also happens in the other direction, but that is a very exotic use case:
Thank you for the clarification, @mgeier. I wasn't aware that numpy.ndarray
converts an all-int array to int64
. That's a surprising gotcha for our use case, actually.
So integer lists must be treated as int64, and scale between -2^63 and 2^63. That's kind of annoying.
So integer lists must be treated as int64, and scale between -2^63 and 2^63. That's kind of annoying.
Hm, yeah, from a user perspective this is an unusual enough range that if it cannot be changed to something less surprising, it might actually be better for it to error on integer lists.
I suppose we could work around this issue by forcing any non-numpy-array iterables to float64 in soundfile.py:1019. This would technically be a backwards-incompatible change, though, which I am somewhat weary to make.
Also, this is the first time I have heard of this issue, so I'd assume it is a rare occurrence. Nevertheless, it should be documented at the very least.
I am going to attempt to send you one or more docs PRs today.
I need clarification on something before I continue with #410.
The code uses np.asarray(), which converts (nested) Python lists to NumPy arrays. If only integers are used, the dtype is chosen to be int64 by default (and we don't specify a dtype explicitly)
This is not exactly what I see in testing.
Consider this sample program:
import soundfile
s = soundfile.SoundFile("outtest.wav", mode = 'w', samplerate=48000, channels=1, subtype='PCM_16')
s.write([0,1,(1<<16)-1,-(1<<31),(1<<31)-1])
print(s)
Result: Saves a wav file with 4 samples, first 2 are 0, third is minimum intensity, final is maximum intensity
import soundfile
s = soundfile.SoundFile("outtest.wav", mode = 'w', samplerate=48000, channels=1, subtype='PCM_16')
s.write([0,1,(1<<16)-1,-(1<<31),(1<<31)])
print(s)
Result:
Traceback (most recent call last):
File "C:\Users\Andi\work\g\other\python-soundfile\soundfile.py", line 1328, in _check_dtype
return _ffi_types[dtype]
~~~~~~~~~~^^^^^^^
KeyError: 'int64'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Andi\work\g\other\python-soundfile\64TEST.py", line 3, in <module>
s.write([0,1,(1<<16)-1,-(1<<31),(1<<31)])
File "C:\Users\Andi\work\g\other\python-soundfile\soundfile.py", line 1020, in write
written = self._array_io('write', data, len(data))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Andi\work\g\other\python-soundfile\soundfile.py", line 1342, in _array_io
ctype = self._check_dtype(array.dtype.name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Andi\work\g\other\python-soundfile\soundfile.py", line 1330, in _check_dtype
raise ValueError("dtype must be one of {0!r} and not {1!r}".format(
ValueError: dtype must be one of ['float32', 'float64', 'int16', 'int32'] and not 'int64'
In other words, from this and other tests I conclude the behavior for python int inputs is:
This behavior is non-ideal, but it seems to at least be consistent (I did not test on non-win10 systems).
I will change my documentation in #410 to match this behavior, but before I do, does this sound right? Am I missing anything?
What does this return?
>>> import numpy as np
>>> np.asarray([1]).dtype
Okay. I have some very bad news. In my Windows 10 venv:
(VENV) C:\Users\Andi\work\g\other\python-soundfile>python
Python 3.11.5 (tags/v3.11.5:cce6ba9, Aug 24 2023, 14:38:34) [MSC v.1936 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.asarray([1]).dtype
dtype('int32')
>>> np.version.version
'1.26.0'
In my WSL (linux VM) venv on the same machine:
(WSLVENV) Andi@Alita:/mnt/c/Users/Andi/work/g/other/python-soundfile$ python3
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.asarray([1]).dtype
dtype('int64')
>>> np.version.version
'1.24.4'
I can't explain this. What does it do on your machine?
If the scale is potentially different (32 or 64 bit) depending on system/configuration, this suggests to me the int form simply should not be used and the documentation should say something like "if the input is a python int array the behavior is undefined".
Alternately, before @bastibe said he didn't want to change the behavior saying
This would technically be a backwards-incompatible change, though, which I am somewhat weary to make.
But, if it was actually broken before (per the tests I pasted above, even if np manages to assign the array an int64, python-soundfile rejects it) maybe you don't need to worry about backward compatibility. If it never worked it seems like logically there must not be existing code to break?
Oh, this is indeed terrible news. Thank you very much for the analysis!
So essentially, there is no consistent option for handling lists of integers across platforms. We could force any non-numpy type to float, but this would break lists of integers entirely. I don't really see a good way forward beyond documenting that "any non-numpy data will be converted to numpy using np.asarray", and noting that using non-numpy integer data is a very bad idea indeed.
In fact, we should consider issuing a warning whenever np.asarray returns an integer type (and it wasn't already a numpy array beforehand). What are your thoughts on that?
Consider this simple use of soundfile:
I expect this to create a wav containing a very slow sawtooth wave (it will not be audible but the waveform will be visible in an editor such as audacity) with a DC offset. Instead, it creates a wav in which all values are zero.
If I pass the values in as numpy rather than as python lists, it works and creates a usable, correct wav file:
It is possible that when using non-numpy input, Soundfile is expecting the values in some format (floats?) other than ints. If so, this is not documented. I assume PCM_16 values because I have specified PCM_16 subtype.
I have a non-numpy app in which I adopted python-soundfile because it had the ability to append to wavs. I understand python-soundfile is mostly intended for use with numpy, but it does advertise a non-numpy interface. Using without numpy, I encountered several inconvences, but this was the biggest problem because I could not resolve it by consulting the documentation. I ultimately had to just add numpy to my program.