`estimate_symbolic_duration` breaks for some unknown durations

sildater commented 2 weeks ago

loading beethoven_op053_mv3.match from the Zeilinger dataset throws an error in estimate_symbolic_duration. The inputs to reproduce without the file are pt.utils.estimate_symbolic_duration(2496, 96). Looking into this, for most smaller fractions returns nonsensical values (like {'type': 'long', 'actual_notes': 1, 'normal_notes': 2}) if it doesn't hit a symbolic duration, for larger fractions the method fails. Test script:

not_working = 0
divs = 12
quarters = 32
for k in range(divs * 0, divs * quarters):
    try:
        a = pt.utils.estimate_symbolic_duration(k, divs)
        print(k, k/divs, a)
    except:
        print("not working:", k, k/divs)
        not_working +=1
print("percentage not working",not_working / (divs * quarters))

manoskary commented 1 week ago

What should the expected duration or the intended behavior be? I guess before partitura=1.5.0 when you had a failed case, the duration was assigned to None. In the example you mention, the resulting quarter duration of 2496 / 96would be 26 quarter notes. 26 quarter notes should be a composite duration, i.e. a tied note. In the latest version of estimate_symbolic_durations, we can handle composite durations but our vocabulary is still limited. To guess composite duration you need to set estimate_symbolic_duration(return_com_durations=True) then the output of the functions can be a list of dicts.

Currently, the estimate_symbolic_duration function has three clauses, actual possible durations, some kind of triplet duration, or composite durations. The choice was made so that the function does not return NoneValue and therefore is forced to choose a duration. Of course, it is very very far from being perfect with the idea being we keep adding new encountered durations as we find errors. Furthermore, the function is required to not fail so it can be able to guess arbitrary triplet durations, but as you've seen this does not work correctly in particular when composite durations are involved.

Another solution would be to allow the function to return None durations or empty dicts then whenever we cannot give a credible answer then we do not give any answer. I would be very open to having a deeper discussion on this topic to better structure the estimation so it could cover as many cases as possible and fit everybody's needs.

sildater commented 1 week ago

Thanks for looking into this! I think the intended behavior should be to a) not raise exceptions and b) not give wrong values. None is good return value instead of the triplet condition which is the buggy part.

manoskary commented 1 week ago

Ok, I think I have a solution, I will add you as the reviewer.

CPJKU / partitura

`estimate_symbolic_duration` breaks for some unknown durations #371