BreezeWhite / oemer

End-to-end Optical Music Recognition (OMR) system. Transcribe phone-taken music sheet image into MusicXML, which can be edited and converted to MIDI.
https://breezewhite.github.io/oemer/
MIT License
394 stars 46 forks source link

Training dataset #28

Closed pingpeng1017 closed 1 year ago

pingpeng1017 commented 1 year ago

I recently had the opportunity to explore your system, and I was impressed by its capabilities. However, I encountered an issue while trying to access the dataset used for training the first model. It appears that the dataset is currently unavailable on the provided page. If you happen to have the dataset, would it be possible for you to provide it? Thanks :)

BreezeWhite commented 1 year ago

Hi @pingpeng1017 , thanks for your feedback ^^ I think it's better to send an inquiry email to them and also let them know the dataset download page is down, since I am not sure if it's proper for me to directly share the data with you.

pingpeng1017 commented 1 year ago

Thanks for your reply. I have discovered that the time signature is not being extracted and I'm curious whether it's not recognised at all, or if it's actually recognised but there is no implementation of the code to convert it to XML. I would greatly appreciate it if you could let me know the possibility of adding this functionality as I would like to have a go :)

BreezeWhite commented 1 year ago

The time signature does appear in the predictions of the two UNet models, but in raw pixel format. I did not managed to recognize the numbers of the time signature since it would take more efforts but would not impact the listening experience too much. Still you could try to recognize the symbol if it is important in your case.

pingpeng1017 commented 1 year ago

Does that mean it's necessary to retrain the UNet models or as it's already appearing in the predicted images, would it be feasible to work with these raw pixel format images and change the SVM to recognise the time signature symbols? I'm currently working on a program that converts music notation into Braille, and being able to recognise time signature symbols would be a game-changer for my project. Any help or guidance you can provide would be incredibly valuable.

BreezeWhite commented 1 year ago

Yes, it's already in the predicted image. You only need to figure out a way to extract which pixels belong to the time signature and what numbers do they represent. For number recognition, you could train another SVM model to recognize, too.

pingpeng1017 commented 1 year ago

Hi, this is me again! Sorry for so many questions. I've trained an SVM model to recognise numbers in the image and I remembered that you said I only need to find a way to extract pixels belonging to the time signature. It seems like you have implemented a method to extract pixels for three specific symbols below:

stems_rests = np.where(sep==1, 1, 0)
notehead = np.where(sep==2, 1, 0)
clefs_keys = np.where(sep==3, 1, 0)

I'm curious about how you managed to obtain these values.