Closed bmcfee closed 8 years ago
Some other points to work out here:
We can have synthesize
resample each signal to the target rate, but this might be wasteful. Maybe there's something more clever we could do?
I think this isn't too difficult, but I might be missing something.
Stereo samples for a mono target get downmixed; mono samples for a stereo output get duplicated. If a user wants to direct a mono sample to the left- or right-channel, this can be done from within the sample generator.
Stereo to stereo is interesting in that we'd have to do channel-wise zero-crossing alignment. The logic I have in mind here is to increase the start- and decrease the end-time to match the closest zc's internal to the sample boundaries, and then add them at the specified output position. I think this shouldn't introduce any discontinuities.
We'll have to be careful to maintain relative sample alignment between left- and right-channels after zca though.
We could dynamically infer the output duration, but there are some subtleties to doing this efficiently. The way I'm picturing it, we could pre-allocate an enormous sparse matrix to contain the synthesized audio, and keep track of the maximum sample number touched during synthesis. When the iterator is exhausted, we can trim down to the inferred duration, convert to dense, and call it a day. The only trick here is making sure that the initial pre-allocation is big enough to contain the audio, but that can be user-configurable. A default of 20 minutes ought to be fine for 99% of cases, if I had to SWAG.
@bmcfee can you elaborate on how resampling each signal to the target rate could we wasteful? I imagine that resampling would happen at the "last minute", in synthesize
, so we could resample only the portions of each input signal that are actually getting included in the output.
Pre-allocating a large matrix sounds reasonable. I suppose another alternative would be to pre-allocate a smaller matrix (or a really small one) and extend it as needed during synthesis, but I don't see any compelling advantages to that approach.
@bmcfee can you elaborate on how resampling each signal to the target rate could we wasteful? I imagine that resampling would happen at the "last minute", in synthesize, so we could resample only the portions of each input signal that are actually getting included in the output.
Yes, but that could introduce artifacts at the boundaries, especially in the low frequencies. It's probably better to resample the entire source signal and then trim out the sample. Of course, if the input is really large, and we're only grabbing a few small pieces of it, then this will do much more work than we need.
A compromise might be to resample a large window around the source sample, and then trim out the region of interest. The window padding could be calculated in terms of the period of the lowest frequency we wish to retain.
For a first cut, I think it's simplest to do the full resample at synthesis time. We can worry about efficiency enhancements later.
Pre-allocating a large matrix sounds reasonable. I suppose another alternative would be to pre-allocate a smaller matrix (or a really small one) and extend it as needed during synthesis, but I don't see any compelling advantages to that approach.
Reallocate and copy will generally be much slower than pre-allocating a very large (but sparse) matrix.
That all makes sense. And I agree that a full resample at synthesis time is a good approach for the initial version.
So just so I am sure about this resampling, we want to take each audio source and re-load the entire file for that audio source, at the target output sample rate?
I also have a version of this working that pre-calculates the size of the target matrix. Do you cats feel strongly that preallocating a giant sparse matrix then trimming is better?
So just so I am sure about this resampling, we want to take each audio source and re-load the entire file for that audio source, at the target output sample rate?
Yeah. Like I said above, it's inefficient, but correct, and we can make it more efficient later on.
I also have a version of this working that pre-calculates the size of the target matrix. Do you cats feel strongly that preallocating a giant sparse matrix then trimming is better?
How does that work? The generator-based synthesis does not know the total track duration.
The algorithm I had in mind is to preallocate something absurdly large with a scipy.sparse
matrix, and then dynamically compute the effective duration after synthesis. This should be both memory-efficient and simple to implement.
Oh, because I didn't make synthesize take a generator - at the moment it just takes a list.
I suppose if we want it to accept a list or a generator, we'll have to preallocate.
Ah. I think it's better to implement the generator version first, and then wrap it with a simpler list interface.
I can hack this out if you're puzzling the details.
(Don't we both have jobs?)
I'll try to get a PR with my version up this weekend, so you cats can see what I am, doing. I suspect that if we pre-allocate, what I have will work for both a list and a generator.
Also, should this return a new Audio object, or should it write out a .wav file?
I'll try to get a PR with my version up this weekend, so you cats can see what I am, doing. I suspect that if we pre-allocate, what I have will work for both a list and a generator.
Yeah, it should be pretty straightforward to switch over.
Also, should this return a new Audio object, or should it write out a .wav file?
If it's an Audio object, that opens up the possibility of chaining synthesizers....
So synthesize returns a new audio object, and then an audio object can write itself out as a .wav file. Got it!
As the above says: https://github.com/algorithmic-music-exploration/amen/pull/25
Closing with #25!
[documenting offline conversation with @blacker ]
Some quick thoughts about how the interface for synthesizing waveforms should look.
synthesize
function iterates over the generator and adds samples into the output stream. It returns a new audio object (I guess, an audio container object itself). Stereo/resampling/zc-alignment are all handled withinsynthesize
This makes it easy to do concatenative synthesis (as above). You can also do additive mixing by having overlapping target times.