Open Bacchushlg opened 3 years ago
For the 16th rests, a few additional samples and a retraining will fix that. I'll take care of this later.
Now, let's focus on this question of chord diagrams. I have just defined a new shape (CHORD_DIAGRAM), together with a new ChordDiagramInter, manually assigned this shape to a few relevant glyphs taken out of Leningrad.pdf example, and trained the classifier from scratch.
[By the way, if we go further on this topic, I will need additional images with chord diagrams in it, in order to increase the variety of diagram samples, and thus reach more stable recognition for this shape]
For the time being, I re-ran the processing of Leningrad case. And here are the raw results on sheet 2 (the first sheet with chord diagrams):
At end of TEXTS step: Notice that OCR has partly recognized horizontal lines and dots out of the diagrams as lines of text. For example , in the left most diagram, under the "D" chord name, it has "detected" this word:
WordInter@26594aed System#1 WordInter{#821(0.755/0.755) stf:1 TEXT "____________..."} g#724 codes[5f,5f,5f,5f,5f,5f,5f,5f,5f,5f,5f,5f,2e,2e,2e] S-52 rel: Containment<-SentenceInter{#820(0.773/0.773) stf:1 S-12 Direction}
The underlying glyph appears in red:
At this point in time, we can let the engine go, it will try to recognize diagrams where the pixels of text underlying glyphs have been removed: The classifier may or may not recognize this glyph as a diagram. In this case it did. But now we have both the "text" and the diagram one upon the other, and they both survive.
We can also decide, at the end of the TEXTS step, to manually clear up these false text lines. Then, we launch the engine on the following steps. This gives:
Compared with the situation before such support of diagrams, is this new situation really interesting for the end-user? Please give me your feed-back.
Obviously, we would get better results on these diagrams if the SYMBOLS step was run before the TEXTS step. But this is not something I'm ready to do, because the pixels of all the lines of real text (such as lyrics) would get in the way of symbol recognition.
Perhaps a more interesting approach, to avoid being "polluted" by things like these chord diagrams, might be to provide the user with some means to manually "erase" some rectangular areas. This could take place right before the TEXTS step, and prevent OCR and OMR engines to process these areas.
Simple and low level, but perhaps efficient enough? What do you think?
Adding on my previous message, let's consider the possibility for the user to define "regions" (I prefer this name to "areas", a name already used in many locations within Audiveris code).
We could have black regions: regions whose content is blacklisted for any OMR processing. We could also have white regions, that are indeed processed even if they intersect black regions. This is to cope with the fact that we would deal only with rectangular regions.
Another advantage of regions is the ability for the user to add some labeling to them, such as:
The more I think of it, the more possibilities appear for the end user.
Dear end user, what do you think of this proposal?
Have a look at #152: Here I had made a couple of suggestions for better text evaluation. And I think that your proposal above is very much in line with it. So, yes: I think that this is a good approach to get the problems solved. And save a lot of work after transcription
16th rests should be better recognized now. Regarding the "regions" idea, I will leave it for 5.3 because it can have important impacts. I will now focus on completing and releasing 5.2
I completely agree: this task mean for sure some bigger rework - and the present version is just too good to be not released!
See former #396
Leningrad.pdf
The appended score has been exported from MuseScore as PDF. So quality should be rather good. Unfortunately the guitar chord symbols strongly disturb the chord detection. It seems that they are some sort of text...
I longer time ago I had proposed to provide some sort of dummy guitar tab symbol the could detect the symbols in order to prevent a wrong detection instead. Nothing would be generated in the export - they would just be placeholders during the transcription process.
Minor addition: none of the 16th rests are detected. Maybe you could just add another training glyph?