BreezeWhite / oemer

End-to-end Optical Music Recognition (OMR) system. Transcribe phone-taken music sheet image into MusicXML, which can be edited and converted to MIDI.
https://breezewhite.github.io/oemer/
MIT License
394 stars 46 forks source link

Error: unit sizes not consistent #9

Closed ultraGentle closed 2 years ago

ultraGentle commented 2 years ago

Hello, and thanks for this project! I've been getting some successes, but I'm trying to understand why the failures occur.

I've attached an image that I thought should be perfect for oemer (piano staff, clean image), but I ran into an error:

oemer.exceptions.StafflineUnitSizeInconsistent: Unit sizes not consistent (th: 0.1): [0.0409582 0.03067699 0.04448853 0.04064915 0.04198626 0.03570326 0.27882942 0.04436703]

Here are my questions:

  1. Any idea why a "good" image is erroring? I tried a higher resolution; same error. Is it the distance between treble/bass staves, or the spacing of the individual staff lines that's the problem?
  2. CPU vs. GPU: any effect on accuracy and errors, or just processing time?
  3. Tensorflow vs. default runtime: any effect on accuracy and errors, or just processing time?

I'm trying to understand what I can do to minimize fatal errors and recognition errors.

Thanks for any suggestions!

Original PDF for reference: bachinv.pdf

JPG used for oemer: bachinv.jpg

BreezeWhite commented 2 years ago

Hi, thanks for the interest in this project. For your questions

  1. Many issues opened before also encountered similar situations, and they were all related to the deskew process. Since oemer assumes there exists some extent of skew in the input image, the deskew process is thus necessary. I've added a new flag --without-deskew to deal with this error. By skipping the deskew step, the image you provided can be successfully transcribed. The result is as the appended file. issue9.zip

2, 3. There should be no differences by using different libraries or hardwares for extracting the features.

ultraGentle commented 2 years ago

Thanks for adding the flag!

So, using --without-deskew, there's no error, but the accuracy is surprisingly low.
(Just an observation; not complaining.)

When I get a chance, I'll experiment with

  1. Taking a skewed picture of the original PDF, to see if the deskew process actually helps recognition.
  2. Using --without-deskew but with different resolutions, to see if that helps.

Additional thoughts

  1. I wonder if there would be a way to automatically detect whether an image is already "perfect."
  2. I wonder if making a number of small, random perturbations of the original image, and compiling a final product from the most sensible results would help.

Just thinking out loud. Again, thanks!

BreezeWhite commented 2 years ago

Glad the change helps! Comments on your thoughts

  1. Some hacky ways could be done, but that would probably incur more problems as well.
  2. I think small random perturbations won't effect the results too much, but image size will.

Closing this issue as the original problem has been resolved.