OpenScore / Lieder

Official mirror of https://musescore.com/openscore-lieder-corpus.
https://musescore.com/openscore-lieder-corpus
Creative Commons Zero v1.0 Universal
20 stars 4 forks source link

Roadmap to MS4; plans for branches, releases #4

Open apacha opened 1 year ago

apacha commented 1 year ago

I've tried to run the MusescoreXML (mscx) to MusicXML (musicxml) conversion from this repo with MuseScore 4 and for a small percentage of the pieces, MuseScore 4 crashes with the following rather cryptic exception:

[70154:1327906:20230413,091641.168260:WARNING crash_report_exception_handler.cc:240] UniversalExceptionRaise: (os/kern) failure (5)

Given the nature of the batch-processing of the conversion script, which stops when the first conversion fails, I wasn't able to identify which pieces cause the issues, so I rewrote the script to this:

import argparse
import subprocess
from pathlib import Path

from tqdm import tqdm

def convert(
    musescore_xml_path: Path,
    musicxml_path: Path,
    # This is the path where MuseScore 4 executable is located on my MacOS machine and will be different for other OS installations
    musescore_command="/Applications/MuseScore 4.app/Contents/MacOS/mscore"
):
    convert_comand = f'"{musescore_command}" -o "{str(musicxml_path.absolute())}" "' \
                     f'{str(musescore_xml_path.absolute())}"'
    process = subprocess.run(convert_comand, stderr=subprocess.PIPE, text=True, shell=True)
    if not musicxml_path.exists():
        print("Failed to convert: " + str(musescore_xml_path) + "\n" + process.stderr)

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description='Converts a directory of MuseScoreXML files to MusicXML using '
                    'MuseScore'
    )
    parser.add_argument('-i', '--input_directory', default="scores", help='The input directory')
    parser.add_argument('-o', '--output_directory', default="scores", help='The output directory')

    args = parser.parse_args()
    input_directory = Path(args.input_directory)
    output_directory = Path(args.output_directory)

    all_musicxml_files = list(input_directory.rglob("*.mscx"))
    for musescore_xml_path in tqdm(all_musicxml_files, desc="Converting MuseScoreXML to MusicXML"):
        musicxml_path = (output_directory / musescore_xml_path.relative_to(input_directory)).with_suffix(".musicxml")
        if musicxml_path.exists():
            continue
        convert(musescore_xml_path, musicxml_path)

which skips pieces that have already been converted and runs the conversion per file (be aware that this kind of blocks your computer because of opening and closing of musescore for each file).

The list of affected files is as follows:

scores/Reichardt,_Louise/12_Deutsche_und_Italiänische_Romantische_Gesänge/01_Frühlingslied/lc5067312.mscx
scores/Reichardt,_Louise/12_Deutsche_und_Italiänische_Romantische_Gesänge/02_Wenn_ich_ihn_nur_habe/lc5100067.mscx
scores/Reichardt,_Louise/12_Deutsche_und_Italiänische_Romantische_Gesänge/04_Wohl_dem_Mann/lc5100073.mscx
scores/Reichardt,_Louise/12_Deutsche_und_Italiänische_Romantische_Gesänge/12_Heymdal_(aus_Ariels_Offenbarungen)/lc5101826.mscx
scores/Reichardt,_Louise/6_Lieder_von_Novalis,_Op.4/3a_Geistliches_Lied/lc5092560.mscx
scores/Reichardt,_Louise/6_Lieder_von_Novalis,_Op.4/5_Noch_ein_Bergmannslied/lc5092612.mscx
scores/Reichardt,_Louise/12_Gesänge/03_Nach_Sevilla/lc5046249.mscx
scores/Reichardt,_Louise/12_Gesänge/01_Erinnrung_zum_Bach/lc5087917.mscx
scores/Reichardt,_Louise/12_Gesänge/07_Volkslied/lc5001925.mscx
scores/Reichardt,_Louise/12_Gesänge/05_Für_die_Laute_componirt/lc5001880.mscx
scores/Reichardt,_Louise/12_Gesänge/09_Der_Spinnerin_Nachtlied/lc5001937.mscx
scores/Reichardt,_Louise/12_Gesänge/02_Der_Sänger_geht/lc5002130.mscx
scores/Reichardt,_Louise/12_Gesänge/04_Vaters_Klage/lc5001870.mscx
scores/Reichardt,_Louise/12_Gesänge/08_Ein_recht_Gemüth/lc5001930.mscx
scores/Reichardt,_Louise/12_Gesänge/10_Die_Veilchen/lc5087944.mscx
scores/Reichardt,_Louise/12_Gesänge,_Op.3/07_Die_Wiese/lc5046362.mscx
scores/Reichardt,_Louise/12_Gesänge,_Op.3/01_Frühlingsblumen/lc5001965.mscx
scores/Reichardt,_Louise/12_Gesänge,_Op.3/03_Die_Blume_der_Blumen/lc5001980.mscx
scores/Reichardt,_Louise/12_Gesänge,_Op.3/08_Kaeuzlein/lc5002015.mscx
scores/Reichardt,_Louise/12_Gesänge,_Op.3/05_Betteley_der_Vögel/lc5001997.mscx
scores/Reichardt,_Louise/12_Gesänge,_Op.3/10_Der_Mond/lc5061932.mscx
scores/Reichardt,_Louise/12_Gesänge,_Op.3/04_Wachtelwacht/lc5001994.mscx
scores/Reichardt,_Louise/12_Gesänge,_Op.3/02_Der_traurige_Wanderer/lc5061310.mscx
scores/Reichardt,_Louise/12_Gesänge,_Op.3/06_Kriegslied_des_Mays/lc5002006.mscx
scores/Reichardt,_Louise/12_Gesänge,_Op.3/09_Hier_liegt_ein_Spielmann_begraben/lc5002019.mscx
scores/Schubert,_Franz/Die_schöne_Müllerin,_D.795/09_Des_Müllers_Blumen/lc4985932.mscx

MuseScore 4 reliably crashes with these files, I'm attaching one here as an example: 09_Der_Spinnerin_Nachtlied.zip

Those files were able to be opened with MuseScore 3, so it might be a bug in MuseScore 4. When opening with MuseScore 3, and saving the files again, it seems to correctly migrate the files and allows to open those files (at least the piece by Schubert was working like this).

Environment:

MarkGotham commented 1 year ago

Hi @apacha ! How's it going?

Thanks for flagging this up. As you note, this repo is built in ms3 (and earlier) and tested for that. It sounds like that's working as expected.

I'm not surprised to find that there's issues with ms4. We're not taking on an ms4 update until that's stable. Same goes for the quartets (for which the GitHub mirror is coming v soon -- watch this space!).

Tagging @shoogle for input, but I suspect he'll agree that bug-fixing ms4 will take a while and is best coordinated over there.

All the same, I'll leave this issue open, renamed as a future TODO ms4 upgrade.

Thanks again. See you at MEC in Paderborn?

apacha commented 1 year ago

Hi @MarkGotham, I'm great, thanks! Congratulations to your professorship in Dortmund. Alright, then I'll keep using MS3 for those failed conversions. I was hoping that an upgrade to MS4 would fix some of those nasty issues with visibility of objects like this: https://musescore.org/en/node/327607 and indeed it seems like that issue is solved, so I'll keep converting them with MS3 (see https://github.com/apacha/Lieder/blob/feat/musicxml_conversion/data/single_file_conversion.py).

Not sure yet whether I'll make it to Paderborn, but I'm afraid that it currently will rather not happen :-(

MarkGotham commented 1 year ago

issues with visibility of objects like this: https://musescore.org/en/node/327607

Oh yeah that's a nasty one. So for the scores where ms4 works, would you say that's a net improvement? If so it's perhaps worth considering a "try ms4; except use ms3", even now?

apacha commented 1 year ago

As far as I've tested, absolutely. And yes, that's exactly what I'm doing now: https://github.com/apacha/Lieder/blob/feat/musicxml_conversion/data/single_file_conversion.py#L14-L23 The result is quite dramatic: Old version

Screenshot 2023-04-13 at 14 20 43

New version

Screenshot 2023-04-13 at 14 21 05
shoogle commented 1 year ago

@apacha, thanks for reporting and including the Python script! Please could you report the crash at https://github.com/musescore/MuseScore/issues and attach the example score there? Thanks!

MarkGotham commented 1 year ago

Hmm. Better ... but:

rettinghaus commented 4 months ago

As MuseScore 4 is out now for some time it would be nice to star updating/upgrading the corpus, too. Maybe create a new branch for this, that may be merged back into main, when everything is finished?

MarkGotham commented 4 months ago

Thanks @rettinghaus.

MS4 has indeed been out for some time, but have the above issues been resolved?

MarkGotham commented 4 months ago

There are various considerations wrt usability here, especially as conversion among MS versions is one directional. There's no reversion to previous version, so one view would be that we move to MS4 when pretty much everyone has. From what I heard, that's definitely not the case yet.

That doesn't speak against @rettinghaus 's proposal of a branch.

Tagging @shoogle for views.

rettinghaus commented 4 months ago

I think most of the issues have been resolved. Furthermore, the transition is one directional but Git isn't, and using versioning we can keep a MS3 version stable here.

apacha commented 4 months ago

I agree with @rettinghaus. If we tag the latest version before the upgrade with something like musescore-3 and put a note into the README it would also be clear that the files have been migrated to the latest version of MuseScore. I've seen similar things in other repos. Alternatively, you could keep an entire musescore-3 branch, if you want to continue support both versions.

MarkGotham commented 4 months ago

True, and we don't anticipate major changes to the songs that would require duplicate work.

So perhaps an MS3 release now (or back-dated to a commit near the time of the MEC paper), then all MS3>MS4 with a manual clean-up as necessary.

That also brings us an important step closer to integration with MEI (via MS>4.2).

@shoogle: OK?

@apacha do you want to PR your script?

shoogle commented 4 months ago

How does this sound?

  1. Create new musescore-4 branch.
  2. Repeat as necessary:
    1. Update main with latest MS3 files.
    2. Convert to MS4 on musescore-4 branch.
    3. Everyone checks MS4 files and reports problems in new issues on this repo.
      • Use issue title MS4 conversion: The problem... for each problem found.
    4. Update conversion script to fix problems that can be fixed programmatically.
    5. Repeat...
  3. When ready, create musescore-3 tag on main.
  4. Run final MS3 to MS4 conversion on main.
  5. Delete musescore-4 branch.

Before we do the final conversion (step 3+), I think we should wait a month or so for the MuseScore (Studio) 4.4 release, plus another month for any patches. But we can do steps 1 and 2 before then.

I'm open to creating more branches in the future. For example, a no-hidden-elements branch that mirrors main but with invisible tempo markings removed. However, I think this should wait until after the MS4 conversion.

shoogle commented 4 months ago

MS4 uses uncompressed folders rather than individual files.

Using scores/Beach,_Amy/4_Songs,_Op.51/3_Juni as an example:

  1. MS4 files go in a new subdirectory called lc6245973 (so lc6245973.mscx becomes lc6245973/lc6245973.mscx).
  2. MusicXML and other files remain where they are now.

Are we happy with this layout?

Another option would be lc6245973.mscx becomes lc6245973/score.mscx, but I think it's best to keep the number in the file name so it's easily visible when these scores are opened in MS4.

rettinghaus commented 4 months ago

MS uses zip compressed files (mscz), other files in the folder are not really relevant. (score_style.mss keeps all style settings.) So we could just keep the mscx file and stick with the old structure.

shoogle commented 4 months ago

other files in the folder are not really relevant

Maybe not for musicology, but the scores won't look right without score_style.mss (it contains the page size, staff spacing, text styles, and various other settings), and if anything has been changed in the Mixer (instrument sound, volume, reverb, pan, etc.) then they won't sound right without audiosettings.json. So we'll need those two files at least in addition to the MSCX.

I guess we can put them in the current score folder rather than a new subdirectory, so the path to the MSCX won't change.

rettinghaus commented 4 months ago

@shoogle Then why not keep the compressed mscz files?

shoogle commented 4 months ago

In the repository? The MSCZ compression...

  1. Spoils the diff, so you can't see what changed between commits.
    • Git just says "binary file was changed", so the new file might not even be a score for all you know!
  2. Interferres with Git's own compression, so the repository grows larger more rapidly.*
    • git clone, git checkout etc. would probably become slower over time.
  3. Is annoying when parsing files with programs besides MuseScore.
    • For example, if scores are uncompressed, it's easy to find out which ones were created in MuseScore 2:
      grep -Flr '<programVersion>2' scores/
    • But if scores are compressed, it requires a more complicated command:
      find scores/ -type f -iname '*.mscz' | xargs -I% sh -c "unzip -p % | grep -q '<programVersion>2' && echo %"

* Admittedly the checked-out files are much, much smaller if you use MSCZ, but you can get the best of both worlds if you use MSCX and enable transparent filesystem compression for the scores directory or the volume it's stored on.

MarkGotham commented 4 months ago

Thanks for this.

It seems to me that your positions (@shoogle and @rettinghaus) are compatible in the "best of all possible worlds" that Peter points to (... with a nod to Candide ;) ...).

Beyond that, we could consider other release options.

lpugin commented 4 months ago

I think it would be great also to have the MEI file exported from the corresponding MS4 version. But of course, I am biased on this.