Closed jayfeldmann closed 4 years ago
@jayfeldmann thanks for the report and the test file.
wavio.read
returns the data exactly as found in the file. As noted in the docstring:
wavio.read()
does not scale or normalize the data. The data in the arraywav.data
is the data that was in the file.
For your file, the values range from -8388608 to 8388607 inclusive (i.e. from -2**23
to 2**23-1
. It is up to you to scale or normalize these as you see fit. For example, if the values are known to be scaled relative to the maximum positive integer representable with a signed 24 bit integer, you could generate an array of floating point values with
wav = wavio.read('test.wav')
normalized_data = wav.data / 2**23
With your file, the values in normalized_data
range from -1.0 to 0.9999998807907104.
To get the normalized range to be exactly -1.0 to 1.0, the transformation would be
normalized_data = wav.data / (2**23 - 0.5) + 1/(2**24 - 1)
Note that "full-scale" is officially defined "leaving the negative maximum code unused" in AES17 and IEC 61606, and libsndfile has some notes on how they normalize to float: http://www.mega-nerd.com/libsndfile/FAQ.html#Q010
@endolith, thanks. In that case, if we know the WAV file doesn't contain the unused code, the conversion to the interval [-1, 1] would be to divide by 2**23 - 1
.
Does that "official definition" mean that, technically, it is a bug for software to generate 24-bit files that contain the sample -2**23
? Should a WAV reader that also provided normalization clip such values to -2**23 + 1
? (Details like this are one reason I'm hesitant to add an option for normalization to wavio.read
. )
I don't know.
The WAV format spec does say that the negative maximum code is legal:
So I would probably say that it makes the most sense (for WAVs of 9-bit or higher) to divide by 2**(bits-1)-1
, so that -1 and +1 correspond to full-scale, and calculations like dBFS = 20*log10(x)
work out correctly, and if the negative maximum code is present in the WAV file, the float should just be allowed to exceed -1? (And clip to [-2**(bits-1)
, 2**(bits-1)-1
] when converting back to WAV, to maintain bit transparency.)
AES17:
full-scale amplitude amplitude of a 997-Hz sine wave whose positive peak value reaches the positive digital full scale, leaving the negative maximum code unused. NOTE In 2's-complement representation, the negative peak is 1 LSB away from the negative maximum code.
IEC 61606:
full-scale amplitude FS amplitude of a 997 Hz sinusoid whose peak positive sample just reaches positive digital full-scale (in 2’s-complement a binary value of 0111…1111 to make up the word length) and whose peak negative sample just reaches a value one away from negative digital full-scale (1000…0001 to make up the word length) leaving the maximum negative code (1000…0000) unused
(I think a normalization option/function would be good to add. Someone just emailed me a few days ago asking why scipy.io.wavfile
was giving them an RMS level around 11 and I had to explain that it's just the raw integer data and not normalized first.)
((And I guess this means I don't agree with libsndfile, since they try to restrict the converted signal to the range [-1.0, 1.0], slightly attenuating it to fit.))
I wrote up a description with examples: https://gist.github.com/endolith/e8597a58bcd11a6462f33fa8eb75c43d
Thanks for the in depth answer, but you should know this non-issue was just a saturday brain lag on my side. For some reason i was convinced that you have the range from 24 bit in every direction (+ and -). Felt a little stupid not gonna lie, because i should've known that :D
Warren Weckesser notifications@github.com schrieb am Sa., 2. Mai 2020, 15:50:
@jayfeldmann https://github.com/jayfeldmann thanks for the report and the test file.
wavio.read returns the data exactly as found in the file. As noted in the docstring:
wavio.read() does not scale or normalize the data. The data in the array wav.data is the data that was in the file.
For your file, the values range from -8388608 to 8388607 inclusive (i.e. from -223 to 223-1. It is up to you to scale or normalize these as you see fit. For example, if the values are known to be scaled relative to the maximum positive integer representable with a signed 24 bit integer, you could generate an array of floating point values with
wav = wavio.read('test.wav') normalized_data = wav.data / 2**23
With your file, the values in normalized_data range from -1.0 to 0.9999998807907104.
To get the normalized to be exactly -1.0 to 1.0, the transformation would be
normalized_data = wav.data / (223 - 0.5) + 1/(224 - 1)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WarrenWeckesser/wavio/issues/11#issuecomment-622956883, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACT5HRLUSQM4232MFQCUVETRPQQKZANCNFSM4MXVOSAA .
Suggestion: Make it an option to scale the data using the -1.0 to 1.0 normalization suggested by @WarrenWeckesser.
For example
def read(file, normalize=False):
....
array = _wav2array(nchannels, sampwidth, data)
if normalize:
array /= (2**23 - 0.5) + 1/(2**24 - 1)
w = Wav(data=array, rate=rate, sampwidth=sampwidth)
return w
(It appears that Matlab's audioread (always) normalize the data this way.)
During some testing with a independently generated Sine Wave that peaks at 0db or +-1f (aprox). But when reading in this wav with wavio this value gets halved. So the array tops around 0.49f (or 8374698 int). I attatched the wav file I test with. It was generated and analyzed in Reaper. testWav.zip