Make it easy to use the predicted pitch classes eg. in Sonic Visualizer

take a trained model
take an audio file
pre-process the audio feature
predict the labels
save the pre-frame class labels into a TSV file
save the pre-frame probability labels into a TSV file
convert labels to segments
store the segment labels into another TSV file

For frame labels the format can be like:

C   Db  D   Eb  E   F   Gb  G   Ab  A   Bb  B
1   0   0   0   1   0   0   1   0   0   0   0
0   0   1   0   0   1   0   1   0   0   0   1
[...]

Ie. a TSV file with header. The columns represent pitch classes. Each value is 0 or 1 indicating if the pitch class is predicted as active or not. For probability labels there can be a float between 0.0 and 1.0 (before thresholding).

For segment labels the format is like:

start   end C   Db  D   Eb  E   F   Gb  G   Ab  A   Bb  B
0.0 2.612267    0   0   0   0   0   0   0   0   0   0   0   0
2.612267    11.45907    0   0   0   0   1   0   0   0   1   0   0   1

The frames as collapsed to time intervals. Each interval is represented by start and end time (in float seconds). The segments should be ordered increasingly and should not overlap. Ideally they should contain no holes (ie. be perfectly adjacent to each other).

bzamecnik / ml

Make it easy to use the predicted pitch classes eg. in Sonic Visualizer #5