WarrenWeckesser / wavio

A Python module for reading and writing WAV files using numpy arrays.
133 stars 19 forks source link

README needs warning about default write() scaling behavior #12

Closed BartMassey closed 1 year ago

BartMassey commented 4 years ago

I've just watched a half-dozen students out of a dozen who tried it use wavio.write() incorrectly: its default scaling to max peak gain was not obvious to them from any documentation. It would be great if a note was added to the README pointing out that unless scale="none" the output will be scaled.

WarrenWeckesser commented 4 years ago

@BartMassey, thanks for the report, and sorry for the confusion!

Can you tell me more about the data your students were saving? Were the samples integer or floating point? How was the data generated, and what was a typical range of the values (i.e. max - min)? What was it about their use of wavio.write that turned out to be incorrect?

I can add some more information to the README file, but the full user documentation for wavio.write is in its docstring. The description of the scale parameter in the docstring explains what it does in gory detail, and the "Examples" section includes examples of using scale='none' and scale='dtype-limits'. I suspect your students, like most folks (myself included!), will use a function like wavio.write with the assumption that it will just do the right thing with the given data. The problem I had with implementing it was to make it flexible (hence the mind-numbingly detailed description of scale), and yet also have it do the right thing by default for the common use-cases. For example, if the input is floating point, the function must convert the samples to integers, and for a simple example like the sum of a few sine waves computed with floating point, scaling to the full range of the output data type seems like the right thing to do.

BartMassey commented 4 years ago

I will start as I should have before, by saying thanks much for wavio, which looks like it represents a great deal of work.

The students (fourth and fifth year CS students mostly without much prior knowledge of audio) were asked to compute a sine wave with overall amplitude of -0.25…0.25 of full scale and then a clipped sine waveform clipped at half sine amplitude but with the same overall amplitude. Most of them chose to work in floating point, which I encouraged, so they literally generated samples in this range. They were then asked to save their waveforms as standard 16-bit integer WAV files.

The students were given their choice of programming language and wave file library. Many of them were unfamiliar with Python and chose this language only because I was using it for examples. I suspect most of them that chose wavio didn't know about Python's docstring functionality, and looked for and failed to find documentation for wavio on the web other than examples here and elsewhere. Those that did find the docstring got as far as "When given a floating point array, this function converts the values to integers": interpreting that as "integers adopted from the 'obvious' floating point range -1…1" seems to me not unreasonable, but some students tried to scale the numbers to the range -32767.0…32767.0 as floating point, which also seems to me not unreasonable.

Of course, those that got this far got an error message about "sampwidth must be specified" when trying to write and added sampwidth=2, at which point the write succeeded. Now these students were all in the same boat: they had a WAV file that "sounded right" and "looked about right", but whose amplitude was actually 4× what had been asked for. I know most of this because I worked with several students to correct their code before the assignment was due. I believe confusion between scale="none" and scale=None was probably also a factor for some students (as it was for me initially; by the by, the docstring mentions this, but says scaling rather than scale in the note, confusing things a bit further.)

wavio is the only one of the many Python, C, Rust and Haskell WAV file writing libraries I've used and a couple I have written myself that I can recall having an AGC functionality by default. This is fine, and maybe better in many applications, but I think it's unusual enough to merit a mention in the README along the lines of "floating point samples saved as integers will be automatically scaled so that the waveform has positive and negative peak amplitude at the limits of the integer range by default (when samples differ)."

Again, I don't mean to be negative, but I did think you might want to know about this experience. tl;dr: students using scipy.io.wavfile.write, pysndfile.sndio.write, Python's builtin wave.Wave_write.writeframes and WAV writing libraries in other languages seemed to have little difficulty writing the required WAV files; students using wavio.write had a high failure rate, due to incomplete web documentation and scaling issues.

endolith commented 4 years ago

students using scipy.io.wavfile.write ... seemed to have little difficulty writing the required WAV files;

Isn't this a fork of scipy.io.wavfile, so they both have the same scaling issues?

WarrenWeckesser commented 4 years ago

@BartMassey, thanks, that's helpful.

I'll do something with the README file and the docstring to try to avoid confusion.

I believe confusion between scale="none" and scale=None was probably also a factor for some students

It's worse than that...

For the record, to use wavio.write with floating point data that is already normalized to the range [-1, 1], the correct argument to give is scale=[-1, 1].

The argument scale='none' will not do the right thing with such data, as it will convert the floaing point values to integers without first scaling them. Combined with sampwidth=2, it would produce a 16-bit file that contains almost entirely 0, with maybe an occasional -1 or 1. (I suspect playing the file would make it clear that something was wrong.)

by the by, the docstring mentions this, but says scaling rather than scale in the note, confusing things a bit further.

Thanks for pointing that out. Actually, that is already fixed in the source code on github, but I haven't released a new version since fixing it.

WarrenWeckesser commented 4 years ago

@endolith, no this is not a fork of scipy.io.wavfile. This is a wrapper of the Python wave module. The SciPy code doesn't handle WAV files with 24 bit integer samples, but wave does, so I wrote this small wrapper of wave to make it easier to read and write WAV files with NumPy arrays instead of Python lists. It started out as an answer to a question on StackOverflow, but after getting some requests, I wrapped it up as a Python package and put it on PyPI.

The first version of wavio.write didn't have an option for scaling the output, and it accepted only integer arrays as input. But then I wanted code like this

fs = 44100
T = 3.0
t = np.arange(0, int(fs*T)) / fs
y = np.sin(2*np.pi*440*t) + 0.5*np.cos(2*np.pi*720*t)
wavio.write('sound.wav', y, fs, sampwidth=3)

to "just work". The wave module doesn't handle floating point, so floating point input must be converted to integers. So I added the scale argument, with the default behavior for floating point input being to scale the full range of the input to the full range of the integer output with no clipping.

endolith commented 4 years ago

@WarrenWeckesser Ah, I misunderstood your comment on https://github.com/scipy/scipy/issues/1930#issuecomment-28460402

BartMassey commented 4 years ago

Thanks much!

Handy84 commented 3 years ago

I also struggle with the scale parameter the output is always on max amplitude I am unable to control the amplitude.

`# -- coding: utf-8 -- """ A script that produces morse code from a given string. By Han Wechgelaer. """

import wavio, sys import numpy as np

audio_raw = np.array([])

morse = { "A" : ".-", "B" : "-...", "C" : "-.-.", "D" : "-..", "E" : ".", "F" : "..-.", "G" : "--.", "H" : "....", "I" : "..", "J" : ".---", "K" : "-.-", "L" : ".-..", "M" : "--", "N" : "-.", "O" : "---", "P" : ".--.", "Q" : "--.-", "R" : ".-.", "S" : "...", "T" : "-", "U" : "..-", "V" : "...-", "W" : ".--", "X" : "-..-", "Y" : "-.--", "Z" : "--..", "." : ".-.-.-", "," : "--..--", "?" : "..--..", "/" : "-..-.", "@" : "...-.-", "1" : ".----", "2" : "..---", "3" : "...--", "4" : "....-", "5" : ".....", "6" : "-....", "7" : "--...", "8" : "---..", "9" : "----.", "0" : "-----", "SOS" : "...---...", chr(32) : chr(32) }

if (len(sys.argv)==2): s = sys.argv[1] else: s = "hello world" s=s.upper() print(s) f_sample = 44100.0 symbol_space = 0.070 char_space = symbol_space 3 word_space = symbol_space 7 short_pulse = symbol_space long_pulse = short_pulse * 3

def append_wave(duration, frequency): t = np.linspace(0, duration, int(duration f_sample), endpoint=False) x = np.sin(2np.pi frequency t) x = x * 1000 global audio_raw audio_raw = np.append(audio_raw,x,axis=None)

for tp,i in enumerate(s): p=morse[i] for pos, j in enumerate(p): if (j==chr(32)): append_wave(word_space,0) print(" / ",end="") else: print(j,end="") if (j=="-"): append_wave(long_pulse, 1000) if (j=="."): append_wave(short_pulse, 1000) if (pos!=(len(p)-1)):
append_wave(symbol_space, 0)

if (p!=chr(32) and tp!=(len(s)-1)): print(" ", end="") append_wave(char_space, 0)

audio_raw = audio_raw.astype("int32") print(max(audio_raw)) wavio.write("testfile.wav", audio_raw, f_sample, scale=None, sampwidth=2)`

Handy84 commented 3 years ago

In order to use the scaling I had to use scale=[-10000,10000] and if my integer wave data has values round 5000 I get a output from 50%....

WarrenWeckesser commented 2 years ago

Hey all,

After a long hiatus from hacking on wavio, I have rewritten--and simplified!--the scaling behavior. Version 0.0.4 had an API that only its creator could love.

The new behavior in 0.0.5, as explained in the updated docstring, is:

For example, to "compute a sine wave with overall amplitude of -0.25…0.25 of full scale", one could do (assuming a frequency of 440 Hz, sample rate of 22050 Hz, duration 3 seconds and 24 bit output):

rate = 22050  # samples per second
T = 3         # sample duration (seconds)
f = 440.0     # sound frequency (Hz)
t = np.linspace(0, T, T*rate, endpoint=False)
x = 0.25 * np.sin(2*np.pi * f * t)  # Amplitude 0.25
wavio.write("sine.wav", x, rate, sampwidth=3, scale=1.0)

I haven't made a release yet, but I should get to that within a week.

Important! This is a backwards incompatible change! If you are still using wavio, once I have released 0.0.5, be sure to either update your code to use the new scaling behavior, or pin the dependency to version 0.0.4.

BartMassey commented 2 years ago

Thanks for looking at this again!

I would urge you to consider making the default scale for floating point 2 (corresponding to a sample range of -1..1) rather than something data-dependent. This is the norm for pretty much all other software out there, and thus will avoid a lot of confusion.

Other than that, the changes look great.

WarrenWeckesser commented 2 years ago

@BartMassey, thanks. I have been flip-flopping on exactly that issue--the default scale. Probably my bias towards the auto-scale to full-width is that I tend to work on one-off sound files where I add up a bunch of oscillations and I don't really care what the min and max end up being. But I was already this close to making the default scale=1. Give me another day to ponder, and I'll probably make that change.

Note that scale determines the maximum positive amplitude or, to use a phrase from the wikipedia page on sine waves, the peak deviation from 0.0. So to say that the input varies between -1 and 1, the argument would be scale=1. The corresponding peak-to-peak value would be 2, but giving the maximum positive value seems more natural to me. (If I were still trying to handle translating the values up or down and fitting the full range to the output, then the peak-to-peak value might make sense--to me, anyway.)

BartMassey commented 2 years ago

Yes, scale=1. I misread this sentence, apologies. Having it be a literal scale factor makes way more sense.

For example, if the values in data range from -1.5 to 1.5, then by setting scale=3, the data in the output will use half the available range of the output type.

I think providing autoscale as an option makes good sense. If you know it's there, you can ask for it when you need it. It's just a really surprising default.

endolith commented 2 years ago

Note that scale determines the maximum positive amplitude or, to use a phrase from the wikipedia page on sine waves, the peak deviation from 0.0. So to say that the input varies between -1 and 1

Note that there are two interpretations of the asymmetry of twos-complement numbers. Either the maximum positive value is considered full-scale or the maximum negative value is: https://gist.github.com/endolith/e8597a58bcd11a6462f33fa8eb75c43d?permalink_comment_id=3437077#gistcomment-3437077

BartMassey commented 2 years ago

As your link says, though, maximum-positive is the right answer for wave files.

A third possibility is to offset the interpretation by a half-LSB, which makes zero unrepresentable but is fully symmetric. I don't know of any standard that does this, but also don't see any reason why it wouldn't work ok in practice. Zero would presumably be represented by toggling the low bit at Nyquist, which would hardly cause a problem.

WarrenWeckesser commented 2 years ago

@endolith, thanks. It looks like what I implemented matches what you describe in your .md file. In the new implementation, when a floating point input is scaled, input values in the interval [-scale, scale] are mapped to the interval [min_int+1, max_int]. For sampwidth=1, that would be [1, 255] (with midpoint 128, as per an old Microsoft document about the RIFF/WAV file format), for sampwidth=2 that would be [-2**15+1, 2**15-1] (with midpoint 0), and sampwidth 3 and 4 follow the behavior of sampwidth=2.

The asymmetry is handled by allowing floating point values slightly less then -scale to be mapped to the minimum integer. This means, for a given scale, the actual acceptable floating point range is [-(1 + 1/c)*scale, scale], where c = 2**(8*sampwidth - 1) - 0.5. Floating point values outside that range result in clipping.

For example, in the following, the value -1.005 is mapped to 0 in the 8 bit WAV file.

In [32]: x = np.array([-1.005, -1, -0.9925, -0.5, 0, 0.5, 0.9925, 1])

In [33]: wavio.write('foo.wav', x, rate=1, scale=1, sampwidth=1)

In [34]: wavio.read('foo.wav').data
Out[34]: 
array([[  0],
       [  1],
       [  1],
       [ 64],
       [128],
       [192],
       [255],
       [255]], dtype=uint8)

I don't know if it is worthwhile adding an option to disable this "feature".

WarrenWeckesser commented 1 year ago

I pushed a new release to PyPI, 0.0.6. (There is also a 0.0.5. It was promptly followed by 0.0.6 because I forgot to update README.rst before releasing 0.0.5.)

If anyone requires the old behavior of write(), they can pin their dependency on wavio to version 0.0.4.