bzamecnik / ml

Machine learning projects, often on audio datasets
MIT License
92 stars 39 forks source link

Make it easy to use the predicted pitch classes eg. in Sonic Visualizer #5

Open bzamecnik opened 8 years ago

bzamecnik commented 8 years ago

For frame labels the format can be like:

C   Db  D   Eb  E   F   Gb  G   Ab  A   Bb  B
1   0   0   0   1   0   0   1   0   0   0   0
0   0   1   0   0   1   0   1   0   0   0   1
[...]

Ie. a TSV file with header. The columns represent pitch classes. Each value is 0 or 1 indicating if the pitch class is predicted as active or not. For probability labels there can be a float between 0.0 and 1.0 (before thresholding).

For segment labels the format is like:

start   end C   Db  D   Eb  E   F   Gb  G   Ab  A   Bb  B
0.0 2.612267    0   0   0   0   0   0   0   0   0   0   0   0
2.612267    11.45907    0   0   0   0   1   0   0   0   1   0   0   1

The frames as collapsed to time intervals. Each interval is represented by start and end time (in float seconds). The segments should be ordered increasingly and should not overlap. Ideally they should contain no holes (ie. be perfectly adjacent to each other).