Closed craffel closed 6 years ago
Some tasks don't have a pre-existing data loading format, so we've had to make one up. It seems silly to make up a format just to satisfy the requirements of mir_eval's conventions.
:+1:
In some cases, the format supported by mir_eval is not always obviously the best format for the task. For example, for transcription, MIDI is a pretty nice format. One could argue that JAMS is better than all of the annotation formats we support.
:+1: to both, though I don't think we want mir_eval to require jams. (JAMS currently requires mir_eval, so that's a non-starter.) I think generally having a clearly defined API for each evaluation / task is the best we can do, and I support getting out of the IO game entirely.
There already exist good (better?) Python data loaders than the ones we implemented. For example, librosa would work well for loading in wav files for source separation evaluation.
I would recommend pysoundfile for this over librosa for this.
Following the principle of "do one thing and do it well", mir_eval should implement evaluation functions, not implement data loading/parsing and a set of scripts for using mir_eval.
:+1: :100: ever-green reference for this
Deprecate support for the mir_eval web service and move it to a separate repository. Treat it as a "proof of concept" instead of a supported project.
:+1:
Move the evaluators to a separate repository. Do not require that every task has an evaluator, but support the existing evaluators which have tasks.
What's the benefit of putting the scripts in a separate repo?
Do not require that each task have a data loading function in io. In the future, only implement data loading functions which are genuinely useful because they don't already exist in other software and which are for widely used annotation formats (e.g. those used in MIREX).
Are there any such formats?
Because of the above point, do not require that each example in each docstring includes data loading. Only include an example when it is useful for seeing how data should be loaded or preprocessed (for example, in the beat submodule where beats should be trimmed before being passed to evaluation functions).
:+1:
In order to accommodate the new repositories, I propose that we move
I support this move. I suggest keeping a mir_eval prefix in the repo names, just for downstream convenience (cloned repos will not inherit the org name), but that's a minor quibble.
sgtm
though I don't think we want mir_eval to require jams.
Agreed, sorry if it was worded unclearly - didn't mean to suggest this. Was just making the point that sometimes there's a better format available.
What's the benefit of putting the scripts in a separate repo?
Delineation between the projects - basically treating the evaluators as a separate entity, which are basically an example for using mir_eval
, rather than a part of mir_eval
.
Are there any such formats?
I would argue for example that load_labeled_intervals
is a nice convenience function for a widely-used MIREX format, same for load_ragged_time_series
, load_pattern
, etc. They also have some sugar in them to make them user-friendly and hopefully give helpful warnings, etc.
I suggest keeping a mir_eval prefix in the repo names, just for downstream convenience (cloned repos will not inherit the org name), but that's a minor quibble.
Good point.
Oh gosh, you can't have underscores in organization names. So the URL would have to be something like http://github.com/mireval/mir_eval http://github.com/mir-eval/mir_eval That is super confusing. I will think about this.
I'm going to scale back some of these requirements a bit.
Deprecate support for the mir_eval web service and move it to a separate repository. Treat it as a "proof of concept" instead of a supported project.
It already is in a different repository; it's not in a different GitHub organization. Per comment above I'm going to punt on the "different organization" thing for now. This TODO is transformed basically to
Move the evaluators to a separate repository. Do not require that every task has an evaluator, but support the existing evaluators which have tasks.
I will still do this. The documentation will need to be updated.
Do not require that each task have a data loading function in io. In the future, only implement data loading functions which are genuinely useful because they don't already exist in other software and which are for widely used annotation formats (e.g. those used in MIREX).
There's no action item on this right now - it's just a modification of guidelines for future contributions.
Because of the above point, do not require that each example in each docstring includes data loading. Only include an example when it is useful for seeing how data should be loaded or preprocessed (for example, in the beat submodule where beats should be trimmed before being passed to evaluation functions).
Same as above, unless we want to remove examples which aren't very helpful. An example is e.g. from the onsets submodule, where the example is
>>> reference_onsets = mir_eval.io.load_events('reference.txt')
>>> estimated_onsets = mir_eval.io.load_events('estimated.txt')
>>> F, P, R = mir_eval.onset.f_measure(reference_onsets,
... estimated_onsets)
I'm not sure it's valuable to just delete this example at this point.
Ok, the webservice is updated with a nice disclaimer, the evaluators are moved to https://github.com/craffel/mir_evaluators with history in-tact, and the docs are updated to point people there for the evaluators. So, all TODOs are done, closing this. In the future we won't require task-specific file loading functions when it doesn't make sense.
Historically,
mir_eval
has required that for each task there exists a data loader which loads in data from some filetype to be used in the task. These data loaders exist in theio
submodule. For example, for beat detection, data has historically been formatted as an ASCII file where each beat location (in seconds) appears on each line, and the functionmir_eval.io.load_events
implements this. There are a few things which prompted this requirement:mir_eval
web service requires that people can upload annotation files for their task in order to receive scores.In a recent survey of
mir_eval
users, the vast majority (97.4%) said they had usedmir_eval
by calling it directly from Python code. About 12.8% said they used the evaluators, and no respondent had used the web service. Similarly, 81.6% said they had used a separate data loading procedure; 26.3% said they had used functions in theio
submodule, and 7.9% said data was loaded because they were using the evaluator.Requiring that every task has a data loading function has some drawbacks.
mir_eval
's conventions.mir_eval
is not always obviously the best format for the task. For example, for transcription, MIDI is a pretty nice format. One could argue that JAMS is better than all of the annotation formats we support.librosa
would work well for loading in wav files for source separation evaluation.mir_eval
should implement evaluation functions, not implement data loading/parsing and a set of scripts for usingmir_eval
.Given that the
mir_eval
web service is not used, the evaluators are sparsely used, and using theio
submodule is much less common than using another input pipeline, I propose the following changes:mir_eval
web service and move it to a separate repository. Treat it as a "proof of concept" instead of a supported project.io
. In the future, only implement data loading functions which are genuinely useful because they don't already exist in other software and which are for widely used annotation formats (e.g. those used in MIREX).beat
submodule where beats should be trimmed before being passed to evaluation functions).In order to accommodate the new repositories, I propose that we move http://github.com/craffel/mir_eval to http://github.com/mir_eval/mir_eval and create repositories http://github.com/mir_eval/web_service and http://github.com/mir_eval/evaluators for the web service and evaluators respectively.
If there are no major objections, I will start implementing these moves and changes next week.