algorithmic-music-exploration / amen

A toolbox for algorithmic remixing, after Echo Nest Remix
BSD 2-Clause "Simplified" License
332 stars 87 forks source link

Deformation architecture #66

Closed bmcfee closed 6 years ago

bmcfee commented 8 years ago

How do we want/expect people to manipulate audio within amen?

The synthesize function is great for re-arranging a clip by timing, but doesn't give us a handle on how to do things like, say, vocal subtraction or time-stretching.

Do we want to provide an object interface for this kind of thing? Or just let folks hack functions themselves? Either way, I think we should not support/allow in-place modification of the audio buffers, since it would either trigger an (expensive) feature analysis or have inconsistent results.

For example, a time-stretcher might look something like:

import pyrubberband as pyrb

def amen_time_stretch(audio, rate=1.0):
    y_stretch = pyrb.time_stretch(audio.raw_samples, audio.sample_rate, rate=rate)
    return Audio(raw_samples=y_stretch,
                         sample_rate=sample_rate,
                         analysis_sample_rate=audio.analysis_sample_rate)

This is pretty simple, but it bothers me that you have to access the Audio object's internals directly and propagate them manually. Maybe that's the only way though?

More generally, I could imagine effects that return multiple clips (eg, source separation), so a consistent object interface might be tricky to pull off here.

tkell commented 8 years ago

Off the top of my head, I like the idea of returning a new Audio object (for time stretching) or two of them (for source separation).

I am not super fussed about Audio(raw_samples=y_stretch,sample_rate=sample_rate, analysis_sample_rate=audio.analysis_sample_rate), though I agree it is not great.

I don't know if we should make these just functions like synthesize, or do something more heavyweight.

tkell commented 8 years ago

Two more thoughts:

a) An Audio object has various methods, like time_stretch or vocal_subtraction, that returns new Audio objects. b) We make objects that do these things to Audio objects - TimeStretcher, etc. They also return new Audio objects.

I prefer a), I am pretty sure. Your thoughts?

tkell commented 7 years ago

As a very simple example, harmonic/percussive splitting: https://github.com/algorithmic-music-exploration/amen/pull/82

bmcfee commented 7 years ago

I prefer a), I am pretty sure.

Can you elaborate on why? The space of audio deformations seems endless to me, and they obviously can't all live as Audio methods.

Deformer objects -- or even plain functions -- seem a lot more flexible and extensible.

tkell commented 7 years ago

Yeah, that's a good point. I think I like functions better, then - and maybe we stick all of 'em in one file? from deformers import amen_time_stretch, say?

(I think it comes from too much Ruby, and a desire to have objects do everything - as opposed to the Java idiom of having a AudioSplitterFactory to make an AudioSplitter to split the Audio, which I hate.)

tkell commented 7 years ago

Did this as from amen.deformation import harmonic_separation, for the record - but there are only two functions, so we can change things if we want.

tkell commented 6 years ago

Also did this in #105 - @bmcfee, gonna close this issue unless you have Strong Opinions.