bmcfee / crema

convolutional and recurrent estimators for music analysis
BSD 2-Clause "Simplified" License
84 stars 22 forks source link

Architecture roadmap #1

Closed bmcfee closed 7 years ago

bmcfee commented 7 years ago

In the redesign of crema, model components will be self-contained estimators.

Package layout

crema/                           # the package that you import
    RESOURCES/                   # where model parameters and other pickles are located
    submodules...                # where submodule and estimator code lives
training/                        # where model development (training, eval) scripts live
tests/                           # unit tests
docs/                            # documentation

Estimator API

Each estimator will have its own Pump object, that handles the audio/model/jams interface.

Each estimator will produce exactly one annotation object for one input waveform.

Within each estimator, the corresponding model will be loaded from a pre-built resource file. These will be defined as keras h5 files.

Each estimator will implement a predict method that maps audio to JAMS annotations. We can alias this to __call__ for a streamlined interface.

Each estimator may optionally implement a transform method that produces a dictionary of {feature_name: features}, where features is a numpy array.

Each estimator will be independently versioned. This information may be best stored within the model h5 resource. Each estimator will be responsible for constructing its own annotation_metadata.

Estimators will be instantiated as singleton objects upon import, so that you can do the following kind of thing:

import crema

chord_ann = crema.Chords(audio_buffer, sample_rate)

chroma = crema.Chords.transform(audio_buffer, sample_rate)

Analyzer API

For any packaged version of crema, we will have a manifest of the included models. These will all be instantiated upon import at the top-level module. This will allow us to have a top-level analyze function that can produce an entire JAMS object for an input track:

jams_est = crema.analyze(filename='/path/to/filename')

Metadata wlil be pulled from the track using pytaglib, and a warning will be issued if metadata cannot be found.

When executed as a script and without a -o flag, it will serialize the JAMS to stdout:

$ python -m crema file.mp3 > file.jams
$ python -m crema -o alternate_path.jams file.mp3

Model development

For each estimator that requires a pre-trained model, the training scripts will be stored under training/ESTIMATOR/ and have the following filename convention:

Other conventions:

bmcfee commented 7 years ago

This is all basically done at this point.