Architecture roadmap - Githubissues

In the redesign of crema, model components will be self-contained estimators.

Package layout

crema/                           # the package that you import
    RESOURCES/                   # where model parameters and other pickles are located
    submodules...                # where submodule and estimator code lives
training/                        # where model development (training, eval) scripts live
tests/                           # unit tests
docs/                            # documentation

Estimator API

Each estimator will have its own Pump object, that handles the audio/model/jams interface.

Each estimator will produce exactly one annotation object for one input waveform.

Within each estimator, the corresponding model will be loaded from a pre-built resource file. These will be defined as keras h5 files.

Each estimator will implement a predict method that maps audio to JAMS annotations. We can alias this to __call__ for a streamlined interface.

Each estimator may optionally implement a transform method that produces a dictionary of {feature_name: features}, where features is a numpy array.

Each estimator will be independently versioned. This information may be best stored within the model h5 resource. Each estimator will be responsible for constructing its own annotation_metadata.

Estimators will be instantiated as singleton objects upon import, so that you can do the following kind of thing:

import crema

chord_ann = crema.Chords(audio_buffer, sample_rate)

chroma = crema.Chords.transform(audio_buffer, sample_rate)

Analyzer API

For any packaged version of crema, we will have a manifest of the included models. These will all be instantiated upon import at the top-level module. This will allow us to have a top-level analyze function that can produce an entire JAMS object for an input track:

jams_est = crema.analyze(filename='/path/to/filename')

Metadata wlil be pulled from the track using pytaglib, and a warning will be issued if metadata cannot be found.

When executed as a script and without a -o flag, it will serialize the JAMS to stdout:

$ python -m crema file.mp3 > file.jams
$ python -m crema -o alternate_path.jams file.mp3

Model development

For each estimator that requires a pre-trained model, the training scripts will be stored under training/ESTIMATOR/ and have the following filename convention:

requirements.txt : requirements file for training this model. This should only be used for helper modules to facilitate training (eg, muda or pescador), and cannot be required for test-time prediction.
index_train.json : a json file listing the (relative) paths to training data as (audio, jams) pairs. Must be parse-able into a pandas dataframe.
index_test.json: like above, but for testing data.
README.md: description of the model architecture, parameters, training strategy, etc.
00-setup.py : any necessary preliminary processing. This includes things like pre-computed data augmentation.
01-prepare.py : pump construction, preliminary feature extraction
- saves the pump as resources/pump.pkl (pickle)
02-train.py : model construction and training.
- loads index_train.json, working path, and pump object
- saves the resulting model as resources/model.h5
- tracks the version number of the model.
03-evaluate.py : testing
- loads the prebuilt model and test data index
- cannot rely upon pre-computed features: testing must be end-to-end
- calls the appropriate mir_eval function on each estimate
- stores the resulting score array as resources/test_data.json

Other conventions:

All training scripts should seed all random number generators.
cached features should be locatable from the training index file, either by name or row number. The learning curriculum is up to the training script, so whatever makes the most sense there is fair game.
the above goes for data augmentation as well, using the track_id.augment_id.ext convention.

bmcfee / crema

Architecture roadmap #1

Package layout

Estimator API

Analyzer API

Model development