JamesOwers / midi_degradation_toolkit

A toolkit for generating datasets of midi files which have been degraded to be 'un-musical'.
MIT License
38 stars 5 forks source link

Fix overlaps on input #57

Closed apmcleod closed 5 years ago

apmcleod commented 5 years ago

Related to #20 (and number #46, in a way)

As discussed in Skype, here are a few examples of how we want it to work (o=onset; .=nothing, -=sustain):

Example 1: Don't cut on offsets. Input:

....o--------
....o----....

Output:

....o--------

Example 2: Cut on onsets. Input:

....o--------
......o------

Output:

....o-o------
apmcleod commented 5 years ago

Example 3: Both Input:

....o--------
......o----..

Output:

....o-o------
apmcleod commented 5 years ago

Essentially, we want any ms which has a sustain to still have a sustain, and any ms which has an onset to still have an onset.

apmcleod commented 5 years ago

Test case, which covers most (all?) of the probable sticking points:

df = pd.DataFrame({
    'onset': [50, 75, 150, 200, 200, 300, 300, 300],
    'track': [0, 0, 0, 0, 0, 0, 0, 1],
    'pitch': [10, 10, 20, 10, 20, 30, 30, 10],
    'dur': [300, 25, 100, 125, 50, 50, 100, 100]
})
res = pd.DataFrame({
    'onset': [50, 75, 150, 200, 200, 300, 300],
    'track': [0, 0, 0, 0, 0, 0, 1],
    'pitch': [10, 10, 20, 10, 20, 30, 10],
    'dur': [25, 125, 50, 150, 50, 100, 100]
})
assert fix_overlaps(df).equals(res)
apmcleod commented 5 years ago

In particular, the current version of fix_overlaps in data_structures looks like it would fail if the overlapping notes are non-consecutive (since it has eliminated the groupby('pitch')). The groupby might unfortunately be necessary...