algorithmic-music-exploration / amen

A toolbox for algorithmic remixing, after Echo Nest Remix
BSD 2-Clause "Simplified" License
332 stars 87 forks source link

Remix interface #5

Closed bmcfee closed 8 years ago

bmcfee commented 9 years ago

[documenting offline conversation with @blacker ]

Some quick thoughts about how the interface for synthesizing waveforms should look.

>>> def my_generator(Track):
    start = Track.duration
    for beat in Track.beats[::-1]:
        start = start - beat.duration
        yield start, beat
>>> syn = synthesize(duration=Track.duration, my_generator)

This makes it easy to do concatenative synthesis (as above). You can also do additive mixing by having overlapping target times.

bmcfee commented 9 years ago

Some other points to work out here:

We can have synthesize resample each signal to the target rate, but this might be wasteful. Maybe there's something more clever we could do?

I think this isn't too difficult, but I might be missing something.

Stereo samples for a mono target get downmixed; mono samples for a stereo output get duplicated. If a user wants to direct a mono sample to the left- or right-channel, this can be done from within the sample generator.

Stereo to stereo is interesting in that we'd have to do channel-wise zero-crossing alignment. The logic I have in mind here is to increase the start- and decrease the end-time to match the closest zc's internal to the sample boundaries, and then add them at the specified output position. I think this shouldn't introduce any discontinuities.

We'll have to be careful to maintain relative sample alignment between left- and right-channels after zca though.

We could dynamically infer the output duration, but there are some subtleties to doing this efficiently. The way I'm picturing it, we could pre-allocate an enormous sparse matrix to contain the synthesized audio, and keep track of the maximum sample number touched during synthesis. When the iterator is exhausted, we can trim down to the inferred duration, convert to dense, and call it a day. The only trick here is making sure that the initial pre-allocation is big enough to contain the audio, but that can be user-configurable. A default of 20 minutes ought to be fine for 99% of cases, if I had to SWAG.

blacker commented 9 years ago

@bmcfee can you elaborate on how resampling each signal to the target rate could we wasteful? I imagine that resampling would happen at the "last minute", in synthesize, so we could resample only the portions of each input signal that are actually getting included in the output.

Pre-allocating a large matrix sounds reasonable. I suppose another alternative would be to pre-allocate a smaller matrix (or a really small one) and extend it as needed during synthesis, but I don't see any compelling advantages to that approach.

bmcfee commented 9 years ago

@bmcfee can you elaborate on how resampling each signal to the target rate could we wasteful? I imagine that resampling would happen at the "last minute", in synthesize, so we could resample only the portions of each input signal that are actually getting included in the output.

Yes, but that could introduce artifacts at the boundaries, especially in the low frequencies. It's probably better to resample the entire source signal and then trim out the sample. Of course, if the input is really large, and we're only grabbing a few small pieces of it, then this will do much more work than we need.

A compromise might be to resample a large window around the source sample, and then trim out the region of interest. The window padding could be calculated in terms of the period of the lowest frequency we wish to retain.

For a first cut, I think it's simplest to do the full resample at synthesis time. We can worry about efficiency enhancements later.

Pre-allocating a large matrix sounds reasonable. I suppose another alternative would be to pre-allocate a smaller matrix (or a really small one) and extend it as needed during synthesis, but I don't see any compelling advantages to that approach.

Reallocate and copy will generally be much slower than pre-allocating a very large (but sparse) matrix.

blacker commented 9 years ago

That all makes sense. And I agree that a full resample at synthesis time is a good approach for the initial version.

tkell commented 8 years ago

So just so I am sure about this resampling, we want to take each audio source and re-load the entire file for that audio source, at the target output sample rate?

I also have a version of this working that pre-calculates the size of the target matrix. Do you cats feel strongly that preallocating a giant sparse matrix then trimming is better?

bmcfee commented 8 years ago

So just so I am sure about this resampling, we want to take each audio source and re-load the entire file for that audio source, at the target output sample rate?

Yeah. Like I said above, it's inefficient, but correct, and we can make it more efficient later on.

I also have a version of this working that pre-calculates the size of the target matrix. Do you cats feel strongly that preallocating a giant sparse matrix then trimming is better?

How does that work? The generator-based synthesis does not know the total track duration.

The algorithm I had in mind is to preallocate something absurdly large with a scipy.sparse matrix, and then dynamically compute the effective duration after synthesis. This should be both memory-efficient and simple to implement.

tkell commented 8 years ago

Oh, because I didn't make synthesize take a generator - at the moment it just takes a list.

I suppose if we want it to accept a list or a generator, we'll have to preallocate.

bmcfee commented 8 years ago

Ah. I think it's better to implement the generator version first, and then wrap it with a simpler list interface.

I can hack this out if you're puzzling the details.

tkell commented 8 years ago

(Don't we both have jobs?)

I'll try to get a PR with my version up this weekend, so you cats can see what I am, doing. I suspect that if we pre-allocate, what I have will work for both a list and a generator.

Also, should this return a new Audio object, or should it write out a .wav file?

bmcfee commented 8 years ago

I'll try to get a PR with my version up this weekend, so you cats can see what I am, doing. I suspect that if we pre-allocate, what I have will work for both a list and a generator.

Yeah, it should be pretty straightforward to switch over.

Also, should this return a new Audio object, or should it write out a .wav file?

If it's an Audio object, that opens up the possibility of chaining synthesizers....

tkell commented 8 years ago

So synthesize returns a new audio object, and then an audio object can write itself out as a .wav file. Got it!

tkell commented 8 years ago

As the above says: https://github.com/algorithmic-music-exploration/amen/pull/25

tkell commented 8 years ago

Closing with #25!