WarrenWeckesser / wavio

A Python module for reading and writing WAV files using numpy arrays.
133 stars 19 forks source link

Wav samples are not read correctly 24bit/48khz #11

Closed jayfeldmann closed 4 years ago

jayfeldmann commented 4 years ago

During some testing with a independently generated Sine Wave that peaks at 0db or +-1f (aprox). But when reading in this wav with wavio this value gets halved. So the array tops around 0.49f (or 8374698 int). I attatched the wav file I test with. It was generated and analyzed in Reaper. testWav.zip

WarrenWeckesser commented 4 years ago

@jayfeldmann thanks for the report and the test file.

wavio.read returns the data exactly as found in the file. As noted in the docstring:

wavio.read() does not scale or normalize the data. The data in the array wav.data is the data that was in the file.

For your file, the values range from -8388608 to 8388607 inclusive (i.e. from -2**23 to 2**23-1. It is up to you to scale or normalize these as you see fit. For example, if the values are known to be scaled relative to the maximum positive integer representable with a signed 24 bit integer, you could generate an array of floating point values with

wav = wavio.read('test.wav')
normalized_data = wav.data / 2**23

With your file, the values in normalized_data range from -1.0 to 0.9999998807907104.

To get the normalized range to be exactly -1.0 to 1.0, the transformation would be

normalized_data = wav.data / (2**23 - 0.5) + 1/(2**24 - 1)
endolith commented 4 years ago

Note that "full-scale" is officially defined "leaving the negative maximum code unused" in AES17 and IEC 61606, and libsndfile has some notes on how they normalize to float: http://www.mega-nerd.com/libsndfile/FAQ.html#Q010

WarrenWeckesser commented 4 years ago

@endolith, thanks. In that case, if we know the WAV file doesn't contain the unused code, the conversion to the interval [-1, 1] would be to divide by 2**23 - 1.

Does that "official definition" mean that, technically, it is a bug for software to generate 24-bit files that contain the sample -2**23? Should a WAV reader that also provided normalization clip such values to -2**23 + 1? (Details like this are one reason I'm hesitant to add an option for normalization to wavio.read. )

endolith commented 4 years ago

I don't know.

The WAV format spec does say that the negative maximum code is legal:

image

So I would probably say that it makes the most sense (for WAVs of 9-bit or higher) to divide by 2**(bits-1)-1, so that -1 and +1 correspond to full-scale, and calculations like dBFS = 20*log10(x) work out correctly, and if the negative maximum code is present in the WAV file, the float should just be allowed to exceed -1? (And clip to [-2**(bits-1), 2**(bits-1)-1] when converting back to WAV, to maintain bit transparency.)

AES17:

full-scale amplitude amplitude of a 997-Hz sine wave whose positive peak value reaches the positive digital full scale, leaving the negative maximum code unused. NOTE In 2's-complement representation, the negative peak is 1 LSB away from the negative maximum code.

IEC 61606:

full-scale amplitude FS amplitude of a 997 Hz sinusoid whose peak positive sample just reaches positive digital full-scale (in 2’s-complement a binary value of 0111…1111 to make up the word length) and whose peak negative sample just reaches a value one away from negative digital full-scale (1000…0001 to make up the word length) leaving the maximum negative code (1000…0000) unused

(I think a normalization option/function would be good to add. Someone just emailed me a few days ago asking why scipy.io.wavfile was giving them an RMS level around 11 and I had to explain that it's just the raw integer data and not normalized first.)

((And I guess this means I don't agree with libsndfile, since they try to restrict the converted signal to the range [-1.0, 1.0], slightly attenuating it to fit.))

endolith commented 4 years ago

I wrote up a description with examples: https://gist.github.com/endolith/e8597a58bcd11a6462f33fa8eb75c43d

jayfeldmann commented 4 years ago

Thanks for the in depth answer, but you should know this non-issue was just a saturday brain lag on my side. For some reason i was convinced that you have the range from 24 bit in every direction (+ and -). Felt a little stupid not gonna lie, because i should've known that :D

Warren Weckesser notifications@github.com schrieb am Sa., 2. Mai 2020, 15:50:

@jayfeldmann https://github.com/jayfeldmann thanks for the report and the test file.

wavio.read returns the data exactly as found in the file. As noted in the docstring:

wavio.read() does not scale or normalize the data. The data in the array wav.data is the data that was in the file.

For your file, the values range from -8388608 to 8388607 inclusive (i.e. from -223 to 223-1. It is up to you to scale or normalize these as you see fit. For example, if the values are known to be scaled relative to the maximum positive integer representable with a signed 24 bit integer, you could generate an array of floating point values with

wav = wavio.read('test.wav') normalized_data = wav.data / 2**23

With your file, the values in normalized_data range from -1.0 to 0.9999998807907104.

To get the normalized to be exactly -1.0 to 1.0, the transformation would be

normalized_data = wav.data / (223 - 0.5) + 1/(224 - 1)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/WarrenWeckesser/wavio/issues/11#issuecomment-622956883, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACT5HRLUSQM4232MFQCUVETRPQQKZANCNFSM4MXVOSAA .

jcbsv commented 4 years ago

Suggestion: Make it an option to scale the data using the -1.0 to 1.0 normalization suggested by @WarrenWeckesser.

For example

def read(file, normalize=False):

    ....

    array = _wav2array(nchannels, sampwidth, data)
    if normalize:
        array /= (2**23 - 0.5) + 1/(2**24 - 1)
    w = Wav(data=array, rate=rate, sampwidth=sampwidth)
    return w 

(It appears that Matlab's audioread (always) normalize the data this way.)