Audiveris / audiveris

Latest generation of Audiveris OMR engine
https://audiveris.github.io/audiveris
GNU Affero General Public License v3.0
1.52k stars 226 forks source link

Very bad recognition #268

Closed oleedd closed 3 years ago

oleedd commented 5 years ago

Recognition doesn't work for such notes. Please fix this. It may significantly increase the level of this program. Pdf: http://s000.tinyupload.com/download.php?file_id=78374271592183761811&t=7837427159218376181109576

Bacchushlg commented 5 years ago

This score is of really poor quality. The major problem seems to be that it is not grey-scale but black-and-white. Although Audiveris internally works on B&W, the initial conversion optimizes the image for optimal recognition.

oleedd commented 5 years ago

It is a score from the beginning of the XX century. For that time it is very qualitative. I think recognition of old notes is the most important, because finding not famous classic virtuoso notes in a music editor format is a very big problem, because it is need too much time to retype them in a music editor. I am sure if Audiveris will be able to recognize such unmodern scores then it will be very popular among classic musicians and the best among all analogs. Possible to fix?

Bacchushlg commented 5 years ago

I don't know, who has scanned this. It very important to leave the scan as gray scan and not to compress to B&W, because Audiveris analyses the image in order to get an optimal input for conversion.

oleedd commented 5 years ago

Unfortunately there is no another versions of scanning in internet. So it seems that it is worth to support B&W for such cases if the problem is really in that.

Bacchushlg commented 5 years ago

It's not a problem of support. B&W is supported. The problem is the quality. If the scan is of bad quality, B&W makes it much more difficult to perform a good transcription. In case of e.g. direct PDF output of notation programs, B&W will work with no problem.

oleedd commented 5 years ago

direct PDF output of notation programs

It is easy and there is nothing to be better than other such programs.

The problem is the quality. If the scan is of bad quality

Real bad quality is when it is not understandable what notes are depicted - when some notes are too pale or blurred. Every musician has came across it. But in this case each note is completely understandable. But they are just not ideally smooth. Sorry, but I don't understand how all such programs may declare support for recognition of handwritten notes if even no one program can work with such old printed classic notes. It is very sad. If you are one of programmers of this project - it is a good chance to make it much better and famous.

hbitteur commented 5 years ago

@oleedd FYI, here is the raw output of 6.0 prototype run on your example: it's just the collection of recognized symbols displayed as annotations. Issue-268.annotations-print.pdf As you can see by yourself, many symbols, especially the hote heads are rather well detected, while others are plainly wrong. Images that look "understandable" to human eyes may be difficult to recognize by software programs. So yes, OMR is still a research area...

maximumspatium commented 5 years ago

Recognition doesn't work for such notes.

I disagree with that claim. The recognition of the attached score is not perfect but it's possible to fix it manually like I just did for you. Here you go: etude.mxl.zip

Real bad quality is when it is not understandable what notes are depicted - when some notes are too pale or blurred. Every musician has came across it. But in this case each note is completely understandable. But they are just not ideally smooth.

Note heads don't impose problems. Other symbols do and they disturb the final stage of the recognition, that's when audiveris starts its virtual understanding of the notation.

Below a fragment from your scan indicating that low quality i.e. low resolution problem. Can you immediately and unambiguously recognize the symbols marked red? poor_quality_symbols

Sorry, but I don't understand how all such programs may declare support for recognition of handwritten notes if even no one program can work with such old printed classic notes. It is very sad.

Audiveris project never claimed to recognize handwritten scores.

And last but not least, you seem to underestimate the difficulty and the limits of the optical music recognition. I'm not going to discuss it here because it was already discussed numerous times in this forum and elsewhere. Don't need to blame the developers if audiveris (being a free software) doesn't perform perfectly for your document. We're constantly working on improving audiveris and some of us do it in their spare time. Error-free OMR is out of reach so don't expect perfect output. This is also true for the commercial OMR software out here...

oleedd commented 5 years ago

Yes - the quality of finger numbers is sometimes really very bad. But the notes are monosemantic. The thing is I have many scores with much more elements and the same quality. I understand that it is difficult to recognize not smooth elements of score. But is it possible to fix this? I think it is a very good goal for further development of Audiveris.

maximumspatium commented 5 years ago

@oleedd

I understand that it is difficult to recognize not smooth elements of score. But is it possible to fix this?

You can easily fix wrongly recognized symbols manually using audiveris user interface. Fixing automatic recognition to perform error-free is impossible, although it may be possible to reduce errors to some degree.

oleedd commented 5 years ago

Fixing automatic recognition to perform error-free is impossible

Of course. It is for sure. I mean only fixing errors in my example. Because similar problems of such quality scores of other users may also be fixed after that.

oleedd commented 5 years ago

More difficult (and important for me) example (more elements): Notes.pdf Please try to fix this. That would be awesome.

maximumspatium commented 5 years ago

More difficult (and important for me) example (more elements):

What piece is this snippet from? The main theme as well as the passages in the left hand resemble Chopin's "winter wind" etude (op. 25, Nr. 11) but the printed editions I'm familiar with don't include your fragment. I'm curious...

oleedd commented 5 years ago

It is a transcription of Leopold Godowsky (the hardest piano composer of all times). My adjacent issue is about the left-handed version of this etude. Possible to fix this?

maximumspatium commented 5 years ago

Possible to fix this?

The scanning resolution of the embedded images in your document is low: pdfimages tool reports horizontal resolution of ≈151 PPI and vertical resolution of ≈ 160 PPI.

This impacts detection of beams that's impossible to fix manually in the current version. I therefore switched the binarization method to GLOBAL with threshold = 140. This yields a much better recognition with far less errors: Chopin Godowski Global

As you see, the most note heads and even that strange beaming contradicting with the common typesetting rules have been recognized very well. The beamed group marked red where beams touch each other is impossible to process without prior editing in an image editor so we can't blame Audiveris for it.

The real problem is that you won't be able to export this recognition into a meaningful MusicXML, let alone the attempt to re-typeset that beast in a state-of-the-art notation software! The reason for that is that the score completely omits tuplet indications so they need to be restored manually for each and every group in each and every voice. No computer software will be ever able to figure it out for itself. In other words, we can't simply "fix a bug" here!

It looks like recognition and re-typesetting of this score will be probably as hard as playing it.

Just for comparison, below the raw output of the leading commercial OMR software SmartScore Pro for the same PDF:

Chopin Godowsky SSP 10 5 8

You can judge yourself what amount of manual work will be required to fix this mess...

oleedd commented 5 years ago

the leading commercial OMR software SmartScore Pro

I have seen many score recognition programs but not this. Interesting. Sad that no update since 2013. It is need to constantly improve programs of such difficult type.

The scanning resolution of the embedded images in your document is low: pdfimages tool reports horizontal resolution of ≈151 PPI and vertical resolution of ≈ 160 PPI.

I don't know why the quality is so low. I thought because for that time such quality was in use. Maybe really. I have not seen a better quality of this etudes.

In your example almost all notes are green. What happens if to save it?

maximumspatium commented 5 years ago

I don't know why the quality is so low. I thought because for that time such quality was in use

No, this setting does reflect the resolution used during digitization of the paper document. I assume you've downloaded this somewhere instead of scanning it yourself, haven't you?

Unfortunately, many consumer-grade scanners produce a lot of artifacts OMR software has difficulties to cope with. It includes broken and bent lines with inconsistent line thickness:

LowQ scanning artifact

Moreover, a lot of images hosted by imslp are unusable for automatic recognition due to very low resolution. In other words, the scanning resolution is under the minimum required to represent the smallest musical glyphs with enough pixels. This makes them barely readable even for humans.

In the last decade, the quasi standard resolution for music recognition has evolved, which is = 300 PPI for standard size notation and 600 PPI for scores with lots of small symbols (conductor scores, for example). Everything below these limits is not guaranteed to be reliable processed by OMR systems although it may work well in individual cases.

maximumspatium commented 5 years ago

In your example almost all notes are green. What happens if to save it?

Do you mean exporting the score to MusicXML? Well, it will produce a huge mess because Audiveris cannot recognize tuplets automatically. Fixing this manually will probably require more time than re-entering the music from scratch.

maximumspatium commented 5 years ago

I have seen many score recognition programs but not this. Interesting. Sad that no update since 2013. It is need to constantly improve programs of such difficult type.

You'll barely see any recent updates for the old-school OMR software because the technology it based on is considered largely exhausted.

All of them (including Audiveris 5.1) are rule-based recognition systems. That implies that some kind of classification algorithm is running over the input image to find all possible interpretations for a group of pixels. One and the same group may receive several interpretations with different level of certainty, i.e. it can be a note head with 90% certainty and a part of the bass clef with 30% certainty. Then a rule-based system is run over that list trying to make musical sense and rejecting everything doesn't make any sense from the system's point of view. Typical rules looks like that: "giving a note head with enough certainty, look for a stem near this note head. If no appropriate stem can be found, reject the note interpretation for the giving object and examine the next interpretation (in our example, the bass clef)". The major problem with this kind of system is that errors produced by the low-level classification algorithms will be propagated to the higher stages of the system where some virtual understanding of the underlying music occurs causing big troubles.

Hervé did a splendid review of the problems and limits of this technology here.

Our team is currently experimenting with a system based on the deep learning technology. While showing some improvements over the rule-based systems of the 90s, it has been proven to require a huge amount of computation resources: the processing time is 4-5 minutes per page on average in a recently manufactured consumer-grade laptop.

In other words, it will take several years for this technology to mature to be practically usable...

oleedd commented 5 years ago

I assume you've downloaded this somewhere instead of scanning it yourself, haven't you?

Of course. This notes are very rare while "usual" Chopin etudes are available in usual music libraries.

Audiveris cannot recognize tuplets automatically.

It is very important. It is worth to try to add this feature in the next version. At least for triplets and sixlets.

switched the binarization method to GLOBAL with threshold = 140

How to do this? I also want to try that for other such type notes.

Our team is currently experimenting with a system based on the deep learning technology.

It is very cool. I think it is the only way to get close to error-free recognition. For the biggest result learning should be done using very difficult score with many elements and not good quality. Exactly like my examples. Because after that easy and medium scores will be recognized very well for sure. About 4-5 minutes per page - I think it is normal because even Google and Nvidia AI projects require too much time. But only if the result will be correct for more than 90%. I can't find an AI based score recognition program. Have you seen such a one?

One and the same group may receive several interpretations with different level of certainty

Your explanation about the current Audiveris algorithms sounds like deep learning. Because both use prognosis of the best result. Strange.

maximumspatium commented 5 years ago

I can't find an AI based score recognition program. Have you seen such a one?

Yes, sure. All OCR and OMR software includes some sort of artificial intelligence (AI).

Tesseract 4 is an example of the deep learning based OCR system (LSTM).

As for OMR, I haven't seen any working deep learning music recognition system so far with exception of early prototypes and research papers...

Your explanation about the current Audiveris algorithms sounds like deep learning.

No, it doesn't. Deep learning (let's call it "AI-2010") is very different compared to rule based systems (let's call it "AI-1993"). Below a coarse description of the differences between them:

In an AI-1993 system, the human programmer specifies how to recognize a document according to the domain knowledge, i.e. he chooses what to compute as well as the precise sequence of computations that hopefully leads to the satisfactory recognition. In other words, the programmer specifies the features or properties of the objects to be recognized as well as methods to obtain these features. The problem is that features picked up by humans may be not optimal for the task to be solved. For example, we want to recognize a crotchet. We'll go ahead and define the appropriate rule that describes a crotchet: it should have a filled note head and a stem attached to it. Then we'll define the appropriate computation sequence: 1) locate all filled heads on the picture using some predefined method, 2) locate vertical lines of a predefined height, 3) match vertical lines against node heads, keep good combinations and reject bad ones. That may sound fine and dandy, but the fundamental problem with this approach is how to design a recognizer for note heads and stems that perform near error-free. What features do we need? If one of the above described steps fails for some reason, the whole recognition sequence will fail as well...

An AI-2010 system operates like that: human programmer gathers enough training data. Then an appropriate learning (network) architecture is chosen. After that, the training algorithm is run on the training data according to the following principle: figure out itself how to compute Y from X. If it doesn't work as expected, different training data and network architecture will be tried out. In this case, the system is responsible for choosing both the features and the computation sequence. The crazy thing about such a system is that it's capable of exploring hidden patterns that aren't immediately obvious to humans!

Audiveris 5.x consists of 20 steps, each of them has been optimized for recognition of a specific kind of musical symbols. Each of these specialized recognizers will be run in a predefined sequence storing its results in a graph for the final processing . The last step of the recognition tries to deduce object's interpretation by applying hardcoded rules to the results produced by the recognizers. It's easy to see that a note (consisting of a head and a stem) won't be recognized properly if its stem was rejected due to inconsistent thickness. Now consider the case where the said note belongs to a tuplet. As you can imagine, if we failed to recognize single notes, we will likely fail to recognize the tuplet as well. That's the classical case of error propagation in the rule-based systems.

As opposite, deep learning systems will discover a set of micro features that are less prone to errors caused by low quality digitization or shape variations. Then the highly interconnected architecture will be capable of solving much more complex object relations than simple rules like “If X, then do Y, else if A, then do B.”

oleedd commented 5 years ago

the training algorithm is run on the training data

So even coding is not need? Only data?

switched the binarization method to GLOBAL with threshold = 140

How can I switch it? I can't find.

When to expect early AI version of Audiveris?

maximumspatium commented 5 years ago

So even coding is not need? Only data?

Yes, that's pretty much it. The catch is that your training data set need to be fairly large and representative enough for the network to capture the most common variations.

For example, the DeepScores dataset for OMR contains 300.000 music pages with about 100.000.000 of objects to learn from...

How can I switch it? I can't find.

Go to BOOK menu and choose "Set Book Parameters..." Then for each page you can specify the desired binarization method. It's set to "ADAPTIVE" by default but you can switch to "GLOBAL" on per-page level. If you're going to change the binarization method for the already opened book, don't forget to reset the recognition by choosing "BOOK -> Reset Book to BINARY"...

When to expect early AI version of Audiveris?

The 6.x prototype is already here if you're keen to try it out.

oleedd commented 5 years ago

Can Audiveris 6.0 prototype automatically detect tuplets?

maximumspatium commented 5 years ago

Can Audiveris 6.0 prototype automatically detect tuplets?

6.x prototype concentrates on recognition of fixed-shape symbols. Final rhythmical interpretation is another hard task. I believe Audiveris already has a capability to cope with such cases. @hbitteur could you kindly comment on this?

hbitteur commented 5 years ago

@oleedd As Maxim said, the 6.0 prototype focuses on fixed-shape symbols. The early results show that it is pretty good at recognizing the most frequent note heads even on poor quality scores, something current 5.1 engine has difficulties to deal with because of its dependency on glyph segmentation.

Note however that everything is not perfect on 6.0 recognition:

Detecting missing tuplets is something significantly different. With some irony, we could say that the symbol classifier does not recognize missing symbols. By definition!

Time misalignment can result from many different factors: bad recognition of one or several symbols (augmentation dot, flag, rest, beam, tuplet, even the time signature itself) and missing information (implicit tuplet, implicit rest).

For the OMR engine, nothing can be trusted with 100% certainty, except the end-user input. A few years ago, Audiveris engine tried to play with weak FRAT symbols (FRAT stands for Flag, Rest, Augmentation-dot, Tuplet), attempting to find a correct combination of them in any given measure. The result was just a CPU hog, with no convincing output.

Of course the OMR could assume that "all measure symbols are correct", that misalignment should thus result from implicit tuplets, and then could try to inject artificial tuplet symbols. Perhaps some music editors can do this, but OMR is a different beast for which the "all measure symbols are correct" postulate does not stand.

I'm not saying this is impossible. I'm used to say that in computing everything is possible, but the cost may vary a lot! :-) We have to accept the fact that OMR output will never be 100% correct on all input scores. And keeping the end-user in the middle of OMR process is a rather practical approach. Compare the effort needed by the end-user to drag n' drop a tuplet sign from the shape palette to its target location on the score with the effort for the OMR developer to implement this "automatic tuplet detection" option. Knowing that we at Audiveris are very few and have plenty of other tasks to carry. But after all, Audiveris is open-source, so any volunteer can come in to address this case.

oleedd commented 5 years ago

OK. I will ease the task. Let's call it not "auto detecting tuplets" but using score grouping. Usually one group of music size is separated from others (like in #267). Another way - to compare the right and left hands - grouping always goes parallelly. If it is possible to export a score in that look as it was recognized, without any modifications which start after pressing the export button - then no problem here.

hbitteur commented 5 years ago

@oleedd Sorry I don't understand what you mean by "music size" and by "score grouping"

oleedd commented 5 years ago

Music size: 3/4 or 2/4 or C. Score grouping: http://www.brl.org/codes/session10/groupings.html

oleedd commented 5 years ago

How to export a score in that look as it was recognized, without any modifications which start after pressing the export button?

hbitteur commented 5 years ago

@oleedd I'm busy right now working on significant modifications in time slots building (the heart of RHYTHM step). As a side effect, this might open the door to automatic tuplet handling. I'll get back to you when enough progress are done.

Bacchushlg commented 5 years ago

Mondscheinsonate.pdf

similar problem with this score (trioles everywhere). So, any new algorithm should work with this, too. But I think, there should be a switch in book parameters that enables this new automatic recognition (normally off!) - otherwise I expect too many errors on "normal" scores...

hbitteur commented 5 years ago

I fully agree with the "optional" aspect

oleedd commented 5 years ago

There are 12 eighth notes in a tact. 12 / 4 (signature) = 3. So Audiveris should choose trioles. More of that - each group is separated from others. So formula - number of notes / time signature. It is not hard to implement.

oleedd commented 5 years ago

Any way to export a score in that look as it was recognized (green elements), without any modifications which start after pressing the export button?

Bacchushlg commented 5 years ago

You can export a score from the moment that all steps have been completed. But if for certain reasons the transcription was stopped (e.g. bad recognition of rhythms) then no export is possible.

maximumspatium commented 5 years ago

Any way to export a score in that look as it was recognized (green elements), without any modifications which start after pressing the export button?

You can save your score in the .omr data format that retains all graphical details. No modification will be made so far. Export into MusicXML requires mapping from graphical representation into higher level musical abstractions (voice, notes, chord, rhythm). It cannot be switched off...

oleedd commented 5 years ago

The problem is that Sibelius can't open .omr files. What music editor can open it?

hbitteur commented 5 years ago

".omr" files use an open and documented XML format so that external applications can read and/or modify them. But to my knowledge, no commercial software so far provides an import feature for .omr data.

The de facto standard today for digital music exchange is MusicXML. The future may be MEI and we at Audiveris are considering the ability to export into MEI format. But this is not for the very next future.

oleedd commented 5 years ago

The MusicXML format is very bad for exporting a score in that look as it was recognized. Also note editors open it in very different ways with different big distortions. Can you please try to add some note editors formats for export (for example .sib and .mus)? Maybe they are similar to XML.

maximumspatium commented 5 years ago

Can you please try to add some note editors formats for export (for example .sib and .mus)?

No chance because both are undocumented proprietary formats.

Maybe they are similar to XML.

No. Both are binary formats encrypted using strong cryptographic methods to prevent unauthorized changes.

oleedd commented 5 years ago

Maybe some (probably free) note editor has its documented format?

maximumspatium commented 5 years ago

Maybe some (probably free) note editor has its documented format?

Intermediate formats are usually poorly documented. Both Capella and Musescore formats are XML based. There is no direct documentation available for them but all relevant details can be extracted from their sources. That's a big advantage of the open-source software...

And there is Lilypond - it's internal format is text-based and uses a relative large set of engraving commands and rules...

oleedd commented 5 years ago

Is there some format similar to .omr but with a filter of unsupported symbols?

hbitteur commented 3 years ago

Closing for lack of activity.

oleedd commented 3 years ago

It is not fixed:

More difficult (and important for me) example (more elements): Notes.pdf Please try to fix this. That would be awesome.

If it will work well for this, it will work well for anything.