decoded encoded performance doesn't reconstruct the original

anusfoil commented 1 year ago

It seems like decoded encoded performance doesn't reconstruct the original. Example can be reproduced from the data in 4*22 dataset:

score = pt.load_musicxml('Chopin_op10_no3.musicxml')
performance, alignment = pt.load_match('Chopin_op10_no3_p01.match')

Comparing the note_array of original performance part and generated performance part demonstrates large difference. It seems that the silence in the beginning is trimmed, and duration is unchanged, but pitch and velocity are much different from the original.

parameters, snote_ids = pt.musicanalysis.encode_performance(score.parts[0], performance.performedparts[0], alignment)
>>> dpt = pt.musicanalysis.decode_performance(score.parts[0], parameters, snote_ids)
>>> dpt.note_array()[:10]
array([(0.        , 0.8775    ,    0,  842, 59, 44, 0, 1, 'n1'),
       (0.71000004, 2.4375    ,  682, 2340, 40, 22, 0, 1, 'n4'),
       (0.78375006, 2.36375   ,  752, 2270, 56, 26, 0, 1, 'n3'),
       (0.7112503 , 2.43625   ,  683, 2339, 64, 54, 0, 1, 'n2'),
       (1.44      , 1.7075    , 1382, 1640, 47, 20, 0, 1, 'n6'),
       (1.4537501 , 1.6937501 , 1396, 1626, 59, 37, 0, 1, 'n5'),
       (2.02      , 1.1275    , 1939, 1083, 56, 32, 0, 1, 'n8'),
       (1.98      , 1.1675    , 1901, 1121, 63, 52, 0, 1, 'n7'),
       (2.5262504 , 0.62125003, 2425,  597, 47, 26, 0, 1, 'n11'),
       (2.5100002 , 0.6375    , 2410,  612, 59, 41, 0, 1, 'n10')],
      dtype=[('onset_sec', '<f4'), ('duration_sec', '<f4'), ('onset_tick', '<i4'), ('duration_tick', '<i4'), ('pitch', '<i4'), ('velocity', '<i4'), ('track', '<i4'), ('channel', '<i4'), ('id', '<U256')])
>>> performance.note_array()[:10]
array([(4.9925 , 0.8775 , 39940,  2200, 47, 44, 1, 0, 'n0'),
       (5.7025 , 2.4375 , 45620, 14770, 28, 22, 1, 0, 'n1'),
       (5.70375, 2.43625, 45630,  9650, 52, 54, 1, 0, 'n2'),
       (5.77625, 2.36375, 46210,  5620, 44, 26, 1, 0, 'n3'),
       (6.4325 , 1.7075 , 51460,  4320, 35, 20, 1, 0, 'n4'),
       (6.44625, 1.69375, 51570,  5000, 47, 37, 1, 0, 'n5'),
       (6.9725 , 1.1675 , 55780,  4430, 51, 52, 1, 0, 'n6'),
       (7.0125 , 1.1275 , 56100,  2480, 44, 32, 1, 0, 'n7'),
       (7.47625, 0.66375, 59810,  5090, 52, 59, 1, 0, 'n8'),
       (7.5025 , 0.6375 , 60020,  4500, 47, 41, 1, 0, 'n9')],
      dtype=[('onset_sec', '<f4'), ('duration_sec', '<f4'), ('onset_tick', '<i4'), ('duration_tick', '<i4'), ('pitch', '<i4'), ('velocity', '<i4'), ('track', '<i4'), ('channel', '<i4'), ('id', '<U256')])

What is the expected behavior of decode encoded performance?

anusfoil commented 1 year ago

After some inspection it seems like the pitch differences are exactly one octave: score notes are one octave higher than the performed note, starting from the match file (E2 is matched with performed note 28. But E2=40 in standard conversion). I think this is created by different MIDI numbering standard (or the performance is just one octave down??).

Good news is that this is the only example so far (no, actually for the entirety of 4*22 dataset). But anyways we need to think about how this case work (reproduce the score pitch or performed pitch?), and also test for checking pitch.

anusfoil commented 1 year ago

so it's an old version of vienna 4*22...

CPJKU / partitura

decoded encoded performance doesn't reconstruct the original #231