craigsapp / humlib

Humdrum data parsing library in C++
http://humlib.humdrum.org
BSD 2-Clause "Simplified" License
31 stars 8 forks source link

[musicxml2hum] Missing exclusive interpretations and wrong order #85

Closed WolfgangDrescher closed 9 months ago

WolfgangDrescher commented 9 months ago

This is my source file: Schiorring - 01 Ach! Herre from.mxl.zip

Click to see the output of musicxml2hum ``` !!!OTL: Choral Bog !!!OMV: Ach! Herre from **kern **fb **kern **kern **kern **kern **kern *part6 *part6 *part5 *part4 *part3 *part2 *part1 *staff6 * *staff5 *staff4 *staff3 *staff2 *staff1 *I"Piano 2 * *I"Piano 1 *I"Bass *I"Tenor *I"Alto *I"Soprano *clefF4 *clefC1 * *clefF4 *clefC1 *clefF4 *clefC4 *clefC3 *clefC1 * * * * * * *Trd0c0 * * *k[] *k[] * *k[] *k[] *k[] *k[] *k[] *k[] * * * * * * * * * *M4/4 *M4/4 * *M4/4 *M4/4 *M4/4 *M4/4 *M4/4 *M4/4 *met(c) *met(c) * *met(c) *met(c) *met(c) *met(c) *met(c) *met(c) * * * *^ *^ * * * * 2C 2cc . 2ryy 2C 2e 2g 2C 2ee 2g 2cc =1 =1 =1 =1 =1 =1 =1 =1 =1 =1 =1 2G 2b . 2d 2G 2b 2g 2G 2dd 2g 2b 2D 2a 7 # 2c 2D 2a 2f#X 2D 2cc 2f#X 2a =2 =2 =2 =2 =2 =2 =2 =2 =2 =2 =2 2GG 2g . 2B 2G 2g 2d 2GG 2b 2g 2g 2C 2g -7 2B-X 2C 2g 2e 2C 2b-X 2e 2g =3 =3 =3 =3 =3 =3 =3 =3 =3 =3 =3 2F 2a . 2A 2F 2a 2c 2F 2a 2c 2a 2D 2b 6 2F 2D 2bni 2f 2D 2f 2d 2b =4 =4 =4 =4 =4 =4 =4 =4 =4 =4 =4 2C 2cc . 2G 2C 2cc 2e 2C 2g 2e 2cc 2GG 2dd . 2G 2GG 2dd 2b 2GG 2b 2g 2dd =5 =5 =5 =5 =5 =5 =5 =5 =5 =5 =5 2AA 2cc 6 2F#X 2AA 2cc 2a 2AA 2cc 2f#X 2cc 4BB 2b 6 2G 4BB 2b 2d (4BB 2dd 2g 2b 4C . . . 4C . . 4C) . . . =6 =6 =6 =6 =6 =6 =6 =6 =6 =6 =6 2D 2a # 2F#X 2D 2a 1d 2D 2dd 2f#X 2a 2GG 2b . [2G 2GG 2b . 2GG 2dd [2g 2b =7 =7 =7 =7 =7 =7 =7 =7 =7 =7 =7 2C 1a 6 5 2G] 2C 1a 2e (2C (2ee (2g] 1a 2D . # 2F#X 2D . 2d 2D) 4dd 2f#X) . . . . . . . . . 4cc) . . =8 =8 =8 =8 =8 =8 =8 =8 =8 =8 =8 * * * * * * *^ * * * * 2GG 2g . 2G 2GG 2g 2B 2d/ 2GG 2b 2g 2g 2E 2cc 6 2G 2E 2cc 2c 2g/ 2E 2cc 2g 2cc * * * * * * *v *v * * * * =9 =9 =9 =9 =9 =9 =9 =9 =9 =9 =9 2D 2b 6 2F 2D 2b 2d 2D 2dd 2f 2b 2C 2cc . 2E 2C 2cc 2g 2C 2ee 2g 2cc =10 =10 =10 =10 =10 =10 =10 =10 =10 =10 =10 2GG 2dd . 2G 2GG 2dd 2b 2GG 2b 2g 2dd 2BB 2dd 6 2G 2BB 2dd 2g 2BB 2dd 2g 2dd =11 =11 =11 =11 =11 =11 =11 =11 =11 =11 =11 2C 2ee . 2c 2C 2ee 2g 2C 2cc 2g 2ee 2AA 2cc 6 2c 2A 2cc 2f#X 2AA 2cc 2f#X 2cc =12 =12 =12 =12 =12 =12 =12 =12 =12 =12 =12 2GG 2dd . 2B 2G 2dd 2d 2GG 2b 2d 2dd 2E 2g 6 2c 2E 2g 2g 2E 2cc 2g 2g =13 =13 =13 =13 =13 =13 =13 =13 =13 =13 =13 2E-X 2a 4 2 2c 2E-X 2a 2f 2E-X 2cc 2f 2a 2D 2b 6 2d 2D 2b 2f 2D 2dd 2f 2b =14 =14 =14 =14 =14 =14 =14 =14 =14 =14 =14 2Eni 2cc 6 2c 2Eni 2cc 2g 2Eni 2cc 2g 2cc 2C 2ee . [2c 2C 2ee 2g 2C [2cc 2g 2ee =15 =15 =15 =15 =15 =15 =15 =15 =15 =15 =15 2F 1dd 6 5 2c] 2F 1dd 2a 2F (2cc] (2a 1dd 2G . . 2B 2G . 2g 2G 2b) 4g . . . . . . . . . . 4f) . =16 =16 =16 =16 =16 =16 =16 =16 =16 =16 =16 2C 2cc . 2c 2C 2cc 2g 2e 2C 2cc 2e 2cc * * * *v *v * * * * * * * * * * *v *v * * * * == == == == == == == == == *- *- *- *- *- *- *- *- *- !!!system-decoration: s1,s2,s3,s4,s5,s6 !!!RDF**kern: i = editorial accidental ```

musicxml2hum fails to convert it to a valid humdrum file and I observed several problems:

  1. There are 6 parts (SATB, Piano 1, Piano 2) but 8 staves are needed because of the two piano parts. However, there are only 6 **kern spines created. But in the output starting from the line with the clefs interpretations it seems to be correct, there are the 8 spines.
  2. On the first line with data tokens there is cc missing in Piano 1 and it seems like the notes of the spine splits are arranged in the wrong order (high notes in left spine, low note in right spine). This only seems to be a problem for the first line.
  3. The **fb spine is included but the data and the corresponding exclusive interpretations are swapped. Compare 2nd and 3rd and spine. (Note that in the source score the piano 2 has the figured bass numbers semantically assigned to the wrong system (right hand), I would rearrange this with extractxx later on and bind them to the left hand system and use *above instead.)
  4. Why is there a interpretation line with Trd0c0?
  5. system-decoration should probably put brackets around the piano parts.

I will try to improve this as much as possible on my own, but tips for a good implementation would be helpful, since I have not yet familiarized myself with musicxml2hum.

craigsapp commented 9 months ago

You can post a sample of the music notation from a Dorico screenshot for examination (I don't have it on my travel computer that I am currently using).

Graphic notation (MusicXML direct to MEI with verovio):

Screenshot 2023-11-28 at 04 58 55

Loading into Musescore 4:

Screenshot 2023-11-28 at 05 05 03

Finale import:

Screenshot 2023-11-28 at 05 58 11
WolfgangDrescher commented 9 months ago

I don't (yet) have the original Dorico file (and I haven't installed it either). A music theory professor friend of mine provided me with the MusicXML files. Sadly the Dorico MusicXML export does not include fermatas. However here is a screenshot of the scan of the print:

Bildschirmfoto 2023-11-27 um 22 00 27

On IMSLP they don't have the version with the middle voices included: https://imslp.org/wiki/Special:ReverseLookup/668808

Your screenshot from Verovio and MuseScore looks good, except for the figured bass numbers. But I cannot guarantee that they are perfectly encoded in the MusicXML file. I will probably proofread everything and add the fermatas (and probably also the odd trills) once I have it in Humdrum.

craigsapp commented 9 months ago

Why is there a interpretation line with Trd0c0?

This is caused by this transposition instruction in the Tenor part:

        <transpose number="1">
          <diatonic>0</diatonic>
          <chromatic>0</chromatic>
          <octave-change>-1</octave-change>
        </transpose>

The diatonic and chromatic values are being recognized, but the octave-change does not seem to be implemented in the converter. There is a problem with processing octave-change in direct loading of MusicXML into verovio as well as in Musescore 4 and Finale 27, where the pitches are displayed an octave too high.

Use of an octave transposition for displaying on C4 clef is unusual and may be a bug in the MusicXML export from Dorico, or a subtle bug related to starting with a vocal tenor clef and then changing the clef to C4 (Sibelius can have buggy vocal tenor clef output depending on how you create the clef, for example). Note that Dorico developers were previously Sibelius developers and speculate as to why the bugs seem to be related.

The transposition instruction should be *ITrd7c12 (or *ITrd-7c-12). I suspect that octave transposition is an error generated in the MusicXML export (that is why a display of the Dorico notation rendering will be useful to determine that).

The octave down transposition needs to be applied to the notes in the Humdrum output (store notes at souding pitch in Humdrum scores is highly preferred), and then *ITrd7c12 needs to be added to indicate written pitch is one octave higher. But this transposition should not be needed and is caused by some problem in the MusicXML export from Dorico.

When this happens in Sibelius files, I used the transpose filter in VHV with the -s option to select a single spine to transpose. Here is the filter I applied (after fixing the spine starts for the piano parts):

transpose -s 6 -t -P8

Meaning: transpose spine 6 down an octave.

WolfgangDrescher commented 9 months ago

I just updated the output on my initial post. I accidentally posted a version where I had removed the **fb spine for debugging purposes.

craigsapp commented 9 months ago

I suspect that there is a bug added to musicxml2hum since July 2021, because the command-line version that I compiled on that date converts correctly, while a newly compiled version is giving the same problem you are having.

Expected conversion:

Screenshot 2023-11-28 at 07 52 37

Output from July 2021 compiled version of musescore2hum:

!!!OTL: Choral Bog
!!!OMV: Ach! Herre from
**kern  **kern  **fb    **kern  **kern  **kern  **kern  **kern  **kern
*part6  *part6  *part6  *part5  *part5  *part4  *part3  *part2  *part1
*staff8 *staff7 *   *staff6 *staff5 *staff4 *staff3 *staff2 *staff1
*I"Piano 2  *   *   *I"Piano 1  *   *I"Bass *I"Tenor    *I"Alto *I"Soprano
*clefF4 *clefC1 *   *clefF4 *clefC1 *clefF4 *clefC4 *clefC3 *clefC1
*   *   *   *   *   *   *Trd0c0 *   *
*k[]    *k[]    *   *k[]    *k[]    *k[]    *k[]    *k[]    *k[]
*   *   *   *   *   *   *   *   *
*M4/4   *M4/4   *   *M4/4   *M4/4   *M4/4   *M4/4   *M4/4   *M4/4
*met(c) *met(c) *   *met(c) *met(c) *met(c) *met(c) *met(c) *met(c)
*   *   *   *^  *^  *   *   *   *
2C  2cc .   2ryy    2C  2e  2g  2C  2ee 2g  2cc
=1  =1  =1  =1  =1  =1  =1  =1  =1  =1  =1
2G  2b  .   2d  2G  2b  2g  2G  2dd 2g  2b
2D  2a  7 # 2c  2D  2a  2f#X    2D  2cc 2f#X    2a
=2  =2  =2  =2  =2  =2  =2  =2  =2  =2  =2
2GG 2g  .   2B  2G  2g  2d  2GG 2b  2g  2g
2C  2g  -7  2B-X    2C  2g  2e  2C  2b-X    2e  2g
=3  =3  =3  =3  =3  =3  =3  =3  =3  =3  =3
2F  2a  .   2A  2F  2a  2c  2F  2a  2c  2a
2D  2b  6   2F  2D  2bni    2f  2D  2f  2d  2b
=4  =4  =4  =4  =4  =4  =4  =4  =4  =4  =4
2C  2cc .   2G  2C  2cc 2e  2C  2g  2e  2cc
2GG 2dd .   2G  2GG 2dd 2b  2GG 2b  2g  2dd
=5  =5  =5  =5  =5  =5  =5  =5  =5  =5  =5
2AA 2cc 6   2F#X    2AA 2cc 2a  2AA 2cc 2f#X    2cc
4BB 2b  6   2G  4BB 2b  2d  (4BB    2dd 2g  2b
4C  .   .   .   4C  .   .   4C) .   .   .
=6  =6  =6  =6  =6  =6  =6  =6  =6  =6  =6
2D  2a  #   2F#X    2D  2a  1d  2D  2dd 2f#X    2a
2GG 2b  .   [2G 2GG 2b  .   2GG 2dd [2g 2b
=7  =7  =7  =7  =7  =7  =7  =7  =7  =7  =7
2C  1a  6 5 2G] 2C  1a  2e  (2C (2ee    (2g]    1a
2D  .   #   2F#X    2D  .   2d  2D) 4dd 2f#X)   .
.   .   .   .   .   .   .   .   4cc)    .   .
=8  =8  =8  =8  =8  =8  =8  =8  =8  =8  =8
*   *   *   *   *   *   *^  *   *   *   *
2GG 2g  .   2G  2GG 2g  2B  2d/ 2GG 2b  2g  2g
2E  2cc 6   2G  2E  2cc 2c  2g/ 2E  2cc 2g  2cc
*   *   *   *   *   *   *v  *v  *   *   *   *
=9  =9  =9  =9  =9  =9  =9  =9  =9  =9  =9
2D  2b  6   2F  2D  2b  2d  2D  2dd 2f  2b
2C  2cc .   2E  2C  2cc 2g  2C  2ee 2g  2cc
=10 =10 =10 =10 =10 =10 =10 =10 =10 =10 =10
2GG 2dd .   2G  2GG 2dd 2b  2GG 2b  2g  2dd
2BB 2dd 6   2G  2BB 2dd 2g  2BB 2dd 2g  2dd
=11 =11 =11 =11 =11 =11 =11 =11 =11 =11 =11
2C  2ee .   2c  2C  2ee 2g  2C  2cc 2g  2ee
2AA 2cc 6   2c  2A  2cc 2f#X    2AA 2cc 2f#X    2cc
=12 =12 =12 =12 =12 =12 =12 =12 =12 =12 =12
2GG 2dd .   2B  2G  2dd 2d  2GG 2b  2d  2dd
2E  2g  6   2c  2E  2g  2g  2E  2cc 2g  2g
=13 =13 =13 =13 =13 =13 =13 =13 =13 =13 =13
2E-X    2a  4 2 2c  2E-X    2a  2f  2E-X    2cc 2f  2a
2D  2b  6   2d  2D  2b  2f  2D  2dd 2f  2b
=14 =14 =14 =14 =14 =14 =14 =14 =14 =14 =14
2Eni    2cc 6   2c  2Eni    2cc 2g  2Eni    2cc 2g  2cc
2C  2ee .   [2c 2C  2ee 2g  2C  [2cc    2g  2ee
=15 =15 =15 =15 =15 =15 =15 =15 =15 =15 =15
2F  1dd 6 5 2c] 2F  1dd 2a  2F  (2cc]   (2a 1dd
2G  .   .   2B  2G  .   2g  2G  2b) 4g  .
.   .   .   .   .   .   .   .   .   4f) .
=16 =16 =16 =16 =16 =16 =16 =16 =16 =16 =16
2C  2cc .   2c  2C  2cc 2g 2e   2C  2cc 2e  2cc
*   *   *   *v  *v  *   *   *   *   *   *
*   *   *   *   *v  *v  *   *   *   *
==  ==  ==  ==  ==  ==  ==  ==  ==
*-  *-  *-  *-  *-  *-  *-  *-  *-
!!!system-decoration: s1,s2,s3,s4{(s5,s6)}{(s7,s8)}
!!!RDF**kern: i = editorial accidental
WolfgangDrescher commented 9 months ago

This was a perfect use case to debug it with git bisect. I found the first bad commit, it is 24db1559b8ab50f05ea767488091e1476f2f3026. The problems seems to be the change in HumGrid::getStaffCount:

https://github.com/craigsapp/humlib/commit/24db1559b8ab50f05ea767488091e1476f2f3026#diff-c557cf502a627b7aab49de8d3977a3473cca135e87b0bcf353bafc289b331348L132-L133

For now I just reverted this change, but you probably had a reason to change this line. Will I break something else if I revert this change?

craigsapp commented 9 months ago

Thanks for tracking the problem down! (I will also have to familiarize myself with git bisect).

The problem as you note is this line changed two months ago:

https://github.com/craigsapp/humlib/blob/24db1559b8ab50f05ea767488091e1476f2f3026/src/HumGrid.cpp#L134-L135

Notice that I kept the previous code as a comment since I was not totally certain about the ramifications of changing the code.

What this line of code does is infer the number of staves in the part by looking at the data line at the start of the first measure in the score, which is causing a problem somehow. Perhaps the information such as the clef, time signature, and key signature are not assigned explicitly to staves 1 and 2, so they are stored only in the first staff, resulting in the current algorithm only seeing a single staff rather than two. Looking at the data, this does not seem to be the case since there are clearly two staves of information in the <attributes> of the last part (P6) in the MusicXML data:

      <attributes>
        <divisions>4</divisions>
        <key number="1">
          <fifths>0</fifths>
          <mode>none</mode>
        </key>
        <key number="2">
          <fifths>0</fifths>
          <mode>none</mode>
        </key>
        <time symbol="common">
          <beats>4</beats>
          <beat-type>4</beat-type>
        </time>
        <staves>2</staves>
        <clef number="1">
          <sign>C</sign>
          <line>1</line>
        </clef>
        <clef number="2">
          <sign>F</sign>
          <line>4</line>
        </clef>
      </attributes>

Loading from Musescore output has the same problem with staff assignments for key and time needing to be inferred from <staves>2</staves> rather than being explicitly indicated by @number on duplicated entries for key and time:

        <divisions>1</divisions>
        <key>
          <fifths>0</fifths>
          <mode>none</mode>
          </key>
        <time symbol="common">
          <beats>4</beats>
          <beat-type>4</beat-type>
          </time>
        <staves>2</staves>
        <clef number="1">
          <sign>C</sign>
          <line>1</line>
          </clef>
        <clef number="2">
          <sign>F</sign>
          <line>4</line>
          </clef>
        </attributes>

Here are some notes on the HumGrid class:

HumGrid is a vector of GridMeasure is a vector of GridSlice is a vector of GridPart is a vector of GridStaff (and GridSide). With GridStaff being a vector of GridVoice* (and GridSide).

GridSide is for tandem spines of non-kern data such as harmony, lyrics, figured bass, chords, etc. There is a GridSide for the system (primarily for harmony) and a GridSide for each staff.

GridSlice is analogous to a HumdrumLine.

GridPart is not directly represented in Humdrum spines, as I prefer to make each staff a separate spine. This is similar to MEI but not to MusicXML. I instead use *part and *staff interpretations to keeps track of grand-staff cases (and in theory three-stave parts such as organ). In theory the HumGrid system could output a GridPart as a spine, with GridStaff as subspines:

**kern
*part1
*^
*staff2 *staff1
=1  =1
1c  1c
=   =
*v  *v
*-

but I find that a bother to deal with piano music and tools such as extract.

The HumGrid class was created specifically to convert from MusicXML into Humdrum (but should be useful to convert from other data types into Humdrum). It is an intermediary step that does not deal with spine manipulations until after the MusicXML content has been loaded. It also manages creation of null data tokens as part of the process to generate a Humdrum file.


The HumGrid::getStaffCount() function should return the number of staves for the part, which is supposed to be 2 for grand-staff parts such as the piano. This function is currently inferring the number of staves in a part by looking at how many staves are present in the first measure of the part. It is intended to look only at data lines (GridSlices) but currently it looks at any type of GridSlice (which is causing problems).

In English(ish), this line:

return (int)this->at(0)->front()->at(partindex)->size(); 

means "return the size of the staff vector for the given part from the first slice of the first measure"

while the previous method:

return (int)this->at(0)->back()->at(partindex)->size();

means "return the size of the staff vector for the given part from the last slide of the first measure"

This second version was failing due in cases where there is a global comment at the end of the first measure, such as:

**kern  **kern
*part1  *part1
*staff2 *staff1
=1  =1
1c  1c
!!linebreak: original
=   =
*-  *-

Where I add a comment at the end of a measure if there is a system break after that measure. The problem is that if the first measure contains a comment at the end of the measure, then there is only one "staff" for this line.

I therefore thought that looking instead at the first entry in the measure would be better (but obviously not). I will use your pull request to revert to looking at the back, but then generalize the code to skip non-spine GridSlice line types by iterating backwards in the GridSlices in the measure until I find data or otherwise the maximum staff count for interpretation/local comment slices.

A more ideal solution would probably to have a state variable for GridPart that stores the maximum staff index of inserted data in all GridMeasure/GridSlice for each GridPart. This would have to be stored in the HumGrid level since the GridPart is embedded within the GridMeasure/GridSlice vectors. But the algorithm in the previous paragraph should work unless (1) there is no data in the first measure, or (2) for some reason only notes/rests are given for only one staff in the first measure.

WolfgangDrescher commented 9 months ago

Thank you for the detailed explanation. It will be of great use when I try to solve some problems with the import of the Schiørring chorales. Especially with the first piano system with the full harmony example (part 5) there were some problems with the rhythm. Also stem directions seem to be incorrect which might be a bad encoding in Dorico or a bad export to MusicXML. I'm not sure yet if the problem for all this is musicxml2hum or a bad MusicXML files. Sometimes the sections where it switches between 4 and 5 voices in the piano were not correct and there were problems with the spine splits. Or missing notes.

I fixed this manually for the first 3 chorales, but it is too time consuming for all 100, so I hope I can solve this by improving the musicxml2hum.

Here is the link to the repository: https://github.com/WolfgangDrescher/schiorring-choral-bog. Keep in mind that repo is heavily work in progress and probably most chorales will still fail to render in Verovio because of invalid humdrum files (except the first 3 and some files that are correct "by accident").