bmcfee / muda

A library for augmenting annotated audio data
ISC License
233 stars 33 forks source link

External dependency I/O overhead for out-of-core pipelines #65

Open Marko-Stamenovic-Bose opened 6 years ago

Marko-Stamenovic-Bose commented 6 years ago

MUDA relies heavily on external command line libraries such as rubberband and sox (lightly wrapped in pyrubberband and pysox) for core deformations such as time-stretch, pitch-shift and drc. These system library wrappers work by writing the transformed signal to disk and then reading it back from disk into memory (presumably to feed an ML algorithm).

The external system call and particularly the additional read-write step introduce a large overhead in highly distributed/multithreaded out-of-core data pipelines. Would it not make sense to either a) allow an option to do an analagous deformation using in-memory python library (for example librosa) or b) replace the external system call altogether with an in-memory transformation?

bmcfee commented 6 years ago

I would be really happy if we could replace the command-line call-outs with proper libraries. Unfortunately, there aren't any replacements that match for quality or functionality, and I'd prefer to not have variable backends for everything.

That said, pyrubberband might get a direct cython implementation soon, which would cut down on most of the issues here. Sox is a different story though.

Marko-Stamenovic-Bose commented 6 years ago

OK that's fair. Cython pyrubberband sounds pretty exciting! For drc, what is a good way to objectively evaluate the quality of the transformation?

bmcfee commented 6 years ago

For drc, what is a good way to objectively evaluate the quality of the transformation?

I think this would depend on your eventual application. In most muda applications, the measure of "quality" that we care about is the hold-out accuracy of a model trained on the augmentation outputs, and that's pretty heavily abstracted from the drc process.

bmcfee commented 6 years ago

It just occurred to me that audiotk might be a good drop-in replacement for Sox. It's got a heavier dependency chain, and I haven't actually used it, but it seems plausible. Anyone feel like taking a crack at reimplementing the DRC class to see if it's worth pursuing?

Marko-Stamenovic-Bose commented 6 years ago

Sure I'll take a look. I did have some headaches getting audiotk up and running, which is not a promising development, but I'll try to take another crack when I have a chance.