BreezeWhite / oemer

End-to-end Optical Music Recognition (OMR) system. Transcribe phone-taken music sheet image into MusicXML, which can be edited and converted to MIDI.
https://breezewhite.github.io/oemer/
MIT License
394 stars 46 forks source link

Make training of segnet, unet and classifiers easier by providing a single entry point to all training steps #54

Closed liebharc closed 9 months ago

liebharc commented 11 months ago

Hi,

This PR should make it easier to retrain segnet, unet and classifiers. It adds a train.py script as entry point to train all models. The goal of this PR is to reproduce the current models without any modifications.

With the changes, I was able to get classifiers and unet models which work as well as the current models.

segnet fails to give meaningful results. It seems to converge in a local optimum where all resulting predictions are zero - or black predictions. @BreezeWhite, do you recall if there is anything I missed which might improve the segnet training? I assume that adding regularization or changing parameters might improve this. But in order to reproduce the current models, it would be good to not experiment too much here.

Assumptions I made:

Fixes:

Changes which are not mandatory but appeared useful for me:

BreezeWhite commented 11 months ago

Since I am just too far away from the initial development and model training, I may not answer all the questions correctly above.

I assume the failure one which you are trying to train is the second UNet model, which I refer it as "model 2" in the Model Prediction section. I guess the most possible failure reason is that you fixed it to predict 3 channels, but it should be 4 which the 4th channel corresponds to "nothing" channel. When using the crossentropy loss for training, such additional channel is always required to fulfill the formula itself. Maybe you could try adjust it and train it again, otherwise I cannot come out any other possible reasons.

For the PR itself, I will review it later when I have time, since the modifications are relatively large.

liebharc commented 11 months ago

Thanks! By now, I'm also sure that I mixed up the models. That's something you can see if you compare the arch.json to the ones which are checked in. That alone didn't improve the results yet, but I'm optimistic that fixing that together with what you proposed gives better results.

I'll update the PR and let you know about the results as soon as I found the time for this.

liebharc commented 10 months ago

I spent some hours now with diffing the arch.json files and consolidated them so that the arch.json which comes out of the training matches the one which are already checked in into git. That required to change the network definitions in the code a bit, which came as a surprise to me. But it can be easily verified by taking a look the arch.json files. I will double-check the results, but in first runs the models seem to be good now - as they give results similar to the pretrained models.

BreezeWhite commented 10 months ago

Glad to hear that! By the way, I don't see any modifications to the arch.json file though. Did you commit it into git? Or you are still working on it?

liebharc commented 10 months ago

No, I never committed the training results to git. From what I experienced, you should always keep weights and arch.json in sync. And right now, I don't intend to update the weights, as the results with this PR are the same as the ones oemer already has. If I find the time, then I might contribute to improved weights in the future.

If you want to verify this PR, then I think the easiest is to run the training for just 1, 2 or 3 epochs. That's not enough epochs to give you good weights, but if you take a look at the predictions to see that they are converging towards the ones oemer uses right now. And you then also can then diff the arch.json files against the current ones.

BreezeWhite commented 9 months ago

That's a bit more work then I expected, and perhaps more than I could afford now 😂 Maybe you could put it somewhere and show the training results and the performance? Like under your forked repo.

liebharc commented 9 months ago

I've added weights and predictions from one example to this dummy release https://github.com/liebharc/oemer/releases/tag/v0.0.1 The weights are trained for just 5 epochs and not the 15 which would be the default. That's just because I didn't want to use an older result, so that I can be absolutely sure that the weights are produced by the code level of this PR - and not one of my previous experiments. But at the same time, I don't have the capacity right now to run the training for the full 15 epochs.

BreezeWhite commented 9 months ago

The results looks awesome! I think it's good to merge this PR when you are ready.

liebharc commented 9 months ago

Thank you! The PR is complete, so you can hit the merge button.

BreezeWhite commented 9 months ago

Great! By the way, I'd like to adding you to this project with write permission, since you've contributed a lot to this project and have a deep understanding of oemer. Would you willing to become a maintainer of this project?

liebharc commented 8 months ago

Thanks @BreezeWhite , that's a big complement! However, I'm not sure if I'll continue working on oemer. If I find more time to work on OMR then I'd likely try to go towards a transformer approach such as https://github.com/NetEase/Polyphonic-TrOMR. I put together a prototype based on TrOMR and the first results looked promising. TrOMR comes without training data and code, so it's not really usable. But it still might be worthwhile to look further into it. The transformer approach could also be combined with oemers staff detection, but that would likely still be a complete rewrite of how oemer works right now. I'm happy to share more details about that if you are interested. But I would assume that this isn't what you envision for the oemer repository?

BreezeWhite commented 8 months ago

@liebharc I see. I still want to promote you as one of the maintainer of this project, even though you might not have enough time to really maintain. It's my stubbornly thanks to you ^^ Regarding TrOMR, that looks interesting. I would be very glad to see a better performance on combining TrOMR with Oemer. Maybe you could apply techniques from both side and develop it in a new project. I am willing to become the first starer ★

liebharc commented 6 months ago

Hello @BreezeWhite , reusing this PR felt like the easiest way to contact you directly. I was able to train TrOMR on an alternative data set and combine it with a staff detection based on segnet/unet from oemer. The results look quite promising. By now I also was able to get my changes cleaned up (a little bit) and I released them as https://github.com/liebharc/homr

It also seems to be able to deal with two of the images provided in the issues section of oemer. Would it be alright for you if I reach out to those people? Getting some evaluation and test data would help to better understand the performance and limitations of the transformer approach.

BreezeWhite commented 6 months ago

Hi @liebharc , glad to hear the good news! I would definitely be the first one to try it out~ As for your concern, I am totally fine with it, since this is an open source world and of course you have freedom to reach out anyone you want ^^ Just for curious, which issues do you refer to?

liebharc commented 6 months ago

Thanks!

since this is an open source world and of course you have freedom to reach out anyone you want ^^

Right, but it still is good to show some respect to fellow contributors, and so I think it's good to first ask you :).

The staff detection of homr was written for images taken by phone cameras. Therefore, homr and oemer have slightly different use cases and strengths. The staff detection of homr is different since camera pictures can be full of noise, and as a result it doesn't seem to get confused by the lyrics in issue 59 and is able to recognize that issue 51 has three voices.

I didn't link the issues, as github would then create a back reference from the issues to this PR and the original PR has nothing to do with the issues.

BreezeWhite commented 6 months ago

Thanks for your kindness~

I haven't tried on the sheet in 51 and 59, but I've tried homr on several example sheet in oemer/docs/images, and it seems like it has difficulties on transcribing multi-melody lines in the same staff 🤔 I haven't read the technical details about TrOMR and don't know if it can handle such situation. Perhaps this can be adjusted in later process stages. There are also other bugs. I assume you are still under development and the code would be more robust in the future? If you need any help, I can do some works^^

Maybe it's too much for this CLOSED PR lol, how about change the place for discussion? Maybe under homr?

liebharc commented 6 months ago

Thanks for the feedback!

Maybe it's too much for this CLOSED PR lol, how about change the place for discussion? Maybe under homr?

Sure, perhaps just create an issue in homr and we can continue there.

it seems like it has difficulties on transcribing multi-melody lines in the same staff 🤔

The weights TrOMR I added to the repo might have issues, as I recognized yesterday. A new training run will take two days or so, and then I know more. But I can easily imagine that homr doesn't achieve the oemer performance for these examples. The transformer is capable of handling chords and multiple melodies in one staff, and should have learned that from https://sites.google.com/view/multiscore-project/datasets. But from my experience the transformer model is like a die roll in these cases, and you get sometimes astonishing good results and sometimes not so much. There would still be plenty of room for experiments :).

If you need any help, I can do some works^^

Always :)

BreezeWhite commented 6 months ago

I see. Just can't wait to see the full power of fully trained weights ^^ Wish you hit it in one shot!