Audiveris / audiveris

Latest generation of Audiveris OMR engine
https://audiveris.github.io/audiveris
GNU Affero General Public License v3.0
1.58k stars 232 forks source link

guitar chord symbols disturb chord detection #439

Open Bacchushlg opened 3 years ago

Bacchushlg commented 3 years ago

Leningrad.pdf

The appended score has been exported from MuseScore as PDF. So quality should be rather good. Unfortunately the guitar chord symbols strongly disturb the chord detection. It seems that they are some sort of text...

I longer time ago I had proposed to provide some sort of dummy guitar tab symbol the could detect the symbols in order to prevent a wrong detection instead. Nothing would be generated in the export - they would just be placeholders during the transcription process.

Minor addition: none of the 16th rests are detected. Maybe you could just add another training glyph?

hbitteur commented 3 years ago

For the 16th rests, a few additional samples and a retraining will fix that. I'll take care of this later.

Now, let's focus on this question of chord diagrams. I have just defined a new shape (CHORD_DIAGRAM), together with a new ChordDiagramInter, manually assigned this shape to a few relevant glyphs taken out of Leningrad.pdf example, and trained the classifier from scratch.

[By the way, if we go further on this topic, I will need additional images with chord diagrams in it, in order to increase the variety of diagram samples, and thus reach more stable recognition for this shape]

For the time being, I re-ran the processing of Leningrad case. And here are the raw results on sheet 2 (the first sheet with chord diagrams): image

At end of TEXTS step: image Notice that OCR has partly recognized horizontal lines and dots out of the diagrams as lines of text. For example , in the left most diagram, under the "D" chord name, it has "detected" this word:

 
WordInter@26594aed System#1
   WordInter{#821(0.755/0.755) stf:1 TEXT "____________..."}
   g#724 codes[5f,5f,5f,5f,5f,5f,5f,5f,5f,5f,5f,5f,2e,2e,2e] S-52
   rel: Containment<-SentenceInter{#820(0.773/0.773) stf:1 S-12 Direction}

The underlying glyph appears in red: image

At this point in time, we can let the engine go, it will try to recognize diagrams where the pixels of text underlying glyphs have been removed: image The classifier may or may not recognize this glyph as a diagram. In this case it did. But now we have both the "text" and the diagram one upon the other, and they both survive. image

We can also decide, at the end of the TEXTS step, to manually clear up these false text lines. Then, we launch the engine on the following steps. This gives: image

Compared with the situation before such support of diagrams, is this new situation really interesting for the end-user? Please give me your feed-back.

Obviously, we would get better results on these diagrams if the SYMBOLS step was run before the TEXTS step. But this is not something I'm ready to do, because the pixels of all the lines of real text (such as lyrics) would get in the way of symbol recognition.

hbitteur commented 3 years ago

Perhaps a more interesting approach, to avoid being "polluted" by things like these chord diagrams, might be to provide the user with some means to manually "erase" some rectangular areas. This could take place right before the TEXTS step, and prevent OCR and OMR engines to process these areas.

Simple and low level, but perhaps efficient enough? What do you think?

hbitteur commented 3 years ago

Adding on my previous message, let's consider the possibility for the user to define "regions" (I prefer this name to "areas", a name already used in many locations within Audiveris code).

We could have black regions: regions whose content is blacklisted for any OMR processing. We could also have white regions, that are indeed processed even if they intersect black regions. This is to cope with the fact that we would deal only with rectangular regions.

Another advantage of regions is the ability for the user to add some labeling to them, such as:

The more I think of it, the more possibilities appear for the end user.

Dear end user, what do you think of this proposal?

Bacchushlg commented 3 years ago

Have a look at #152: Here I had made a couple of suggestions for better text evaluation. And I think that your proposal above is very much in line with it. So, yes: I think that this is a good approach to get the problems solved. And save a lot of work after transcription

hbitteur commented 3 years ago

16th rests should be better recognized now. Regarding the "regions" idea, I will leave it for 5.3 because it can have important impacts. I will now focus on completing and releasing 5.2

Bacchushlg commented 3 years ago

I completely agree: this task mean for sure some bigger rework - and the present version is just too good to be not released!

hbitteur commented 3 years ago

See former #396