Audiveris / audiveris

Latest generation of Audiveris OMR engine
https://audiveris.github.io/audiveris
GNU Affero General Public License v3.0
1.52k stars 227 forks source link

Chords recognition and wrong export #44

Open webfischi opened 6 years ago

webfischi commented 6 years ago

At the moment the program refuses to regognize chords, and if sth is reconized it is not written as chord into the music xml. How good are chords supported yet? Is it important that they are English(A Bb B C C#...) or will it work with German (A Bb H C C#...) and Complete German (A B H C C#) as well? Will it care if it's F# or Fis? How close do they have to be to the line or the note head? Are chord tables supported? What about special chords Cmaj7 where the 7 is printed in superscript?

hbitteur commented 6 years ago

Today the situation is: "not yet supported" The code for chord names already exists in Audiveris (see class ChordNameInter.java in particular for more details) but we never had enough time to really activate and test it. Sorry!

maximumspatium commented 6 years ago

That's a lot of questions, isn't? :)

In short, the recognition of text including chord symbols is generally poor. Audiveris relies on Tesseract OCR that performs all the work of automatic labeling and recognition of textual items. If Tesseract fails to recognize short text strings because they contains "strange" (from OCR's POV) character sequences, there is nothing Audiveris can do about that.

In other words, it's not a question of a specific language files but merely a question of a specific grammar.

For the moment being, the Audiveris project lacks the manpower. We therefore decided to focus our development on recognition of the primary notational elements (notes, clefs, staves etc.)

Our internal evaluation has shown that it's always easier and faster to manually reenter textual elements (like chords and lyrics) than trying to fix the garbage produced by poor automatic recognition. We'd be glad to see it proven wrong!

IIRC, chord strings won't yet be exported to MusicXML (mainly due to garbage output). There is no support for chord diagrams (yet). Any contribution is highly welcome!

webfischi commented 6 years ago

Is it possible to deactivate it completely at the moment, because normal chords the OCR can read are written into the music xml file as composer, and therefore create a lot of additional work. See attached screenshot. a horrible mess

hbitteur commented 6 years ago

Could you elaborate on what you mean by "deactivate it"? Are you referring to the whole OCR TEXT step (which tries to retrieve and recognize all potential pieces of text)? Or just the export to MusicXML of certain roles of text (such as "Composer")? Knowing that a text "role" is assigned via some (very questionable) heuristic. You can modify a text role manually via UI available in the "development" branch. In Audiveris today (5.1 target), there is yet no attempt to recognize ChordName instances out of textual items.

Could you provide us with a concrete example (the image you are trying to transcribe) and add your own comments about what you don't like and would like to get rid of. This would ease the discussion.

webfischi commented 6 years ago

Ah ok, I will look into that text role thing, maybe it will reduce the extra work already. It could also be fixed by ignoring single letters above the stave, but don't write anything into the composer part of the mxl would fix the problem as well. In this case also dashes from the songtext got into that part. There have been way more problems with that peticular song, i will just show the first few bars due to copyright. But it should be enough to get the Idea. Below is the beginning of the original file and below the export opened in musescore 2.1

the original file first page in musescore

hbitteur commented 6 years ago

These textual items are a real mess! Many A, E, D coming from chord names are presented at the top of the score (perhaps because they got assigned a Composer role). Some lyrics are not presented with the proper staff. In the original lyrics, here and there, we can see a rest sign (8th), which must confuse the OCR. Have you tried another music editor than MuseScore? Just to check where the layout errors come from.

Could you send me by private mail just the first sheet image (my mail is herve dot bitteur at audiveris dot org). I will exercise Audiveris on it, and look for a few fixes or workarounds.

webfischi commented 6 years ago

I tried my paid music editor Forte 8 Home and get a sytax error with all mxl files: "The selected MusicXML file contains a syntax error and can't be opened. ((E028))" There is also a free version available if you want to try it yourself, but it is a little restricted, just don't use to complex sheet music for testing.

The rest signs are recognized as 7. They can be found in bar 6 below the first 8th.

hbitteur commented 6 years ago

I just downloaded Forte 9 free. On any .mxl (compressed) or .xml (not compressed) MusicXML file, I get: "The file cannot be opened. The file format is not supported or invalid". Even for the files on MusicXML site, not only mine. Does it need a specific plugin to import MusicXML?

I tried MuseScore, Finale, EasyABC, PriMus. They all accept the MusicXML file exported by Audiveris on your sheet. So, I suspect something wrong with Forte installation.

hbitteur commented 6 years ago

First feedback on this sheet:

Actions:

webfischi commented 6 years ago

I know, Forte is the biggest mess I ever spent money for, that's why I stopped at version 8 and moved over to musescore. I checked the version features and saw, that the free version doesn't suport music xml import. But there are 30 day trial version of every other version. And chord names, are the biggest mess ever created in music. If you buy a songbook in germany, there is no standard. All 3 variations mentioned in the first post can be present in 1 songbook plus B♮. There are also 4 variation for sharp/flat minor chords, they can be either written as, "C#m", "Cism", "c#" or "cis"! It's important if the letter is big or small For tessaract there is a big training documentation, it should be possible to create a music specific traineddata file with a linux based systems. https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00

maximumspatium commented 6 years ago

There are also 4 variation for sharp/flat minor chords, they can be either written as, "C#m", "Cism", "c#" or "cis"!

I doubt we'll be ever able to support all variations!

For tessaract there is a big training documentation, it should be possible to create a music specific traineddata file with a linux based systems.

Hervé and I already tried to train Tesseract 3.x. That's unfortunately easier said than done! The key component is a good training set, otherwise you won't be able to obtain the desired result.

Moreover, we'll probably have to wait for the official release of Tesseract v4 which utilizes another recognition technology (LSTM). Once released, a Javacpp bindings need to be available. Otherwise, there is no chance to access the new engine from audiveris.

We would have to dig into Tesseract tweaks to support them, perhaps by defining a specific character set. We have no control on Tesseract software, could someone on this forum investigate Tesseract "tuning" documentation?

One possibility is to play with disabling dictionaries and "whitelists" to restrict the character set, see here. But for that to work, we first need to properly recognize the role of the text item (for example, disabling dictionaries for musical directions doesn't make any sense). The big question is how to do that. There are two methods, both relying on (fragile) heuristics:

Anyway, it looks like a chicken and egg situation.

For the moment being, audiveris simply let Tesseract label and recognize text items in one pass (the Text step). We'll probably need to rethink our communication with the OCR engine, something like this (untested):

UPDATE: I've just opened an appropriate discussion in the Tesseract user forum...

webfischi commented 6 years ago

Is it possible to integrate an editor which allows the user to mark textual items and select by hand, if it's text or chords if the software itself can't decide it? It would be workaround, if it is not too hard to implement. One OCR software I tested even offered the possibility to add missing lyrics and chords by hand. Maybe adding a custom dictionary (lang.user-words) for tesseract could help.

hbitteur commented 6 years ago

I agree with the "chicken and egg" situation: The TEXT step of Audiveris needs to detect upfront if a sentence is a lyric sentence, before processing it further (and very differently from the other roles).

We are about to release 5.1 which is roughly 5.0 plus a basic UI meant for the end-user to correct main OCR mistakes. Among the UI features, I am right now introducing the ability to select a glyph and ask to transcribe it to text. This glyph can be fairly big and can result from the gathering on-the-fly of several smaller glyphs. So as @webfischi said, perhaps the user could interactively and directly ask for transcription of the compound glyph to "lyric text" explicitly.

Doing so, we would jump over the "chicken and egg" situation, and come up with a bunch of lyric lines made of lyric items. Always with the ability to undo the action line by line, if so desired. As well as the ability to assign a different role later.

Give me a few days for a first implementation, to be available in the "development" branch.

hbitteur commented 6 years ago

@webfischi Here is the first implementation. In the "Physicals" part of shape palette, besides the "txt" icon, there is a new "lyrics" icon. The "txt" icon tells the program to OCR the selected glyph as text, while the "lyrics" icon tells the program to OCR the glyph as lyrics. So, here is the best way to proceed: first remove the incorrect lines, then embrace the lines of a block of lyrics and double-click on "Lyrics" icon (or "Txt" for a non-lyric piece of text)

While this does not yet improve the standard TEXTS step, it does offer the user the ability to redo some OCR in a directed manner. I will push this to Github ("development" branch) in a few minutes.

With a few user actions, here is what I could export to Finale: finale

maximumspatium commented 6 years ago

You can modify a text role manually via UI available in the "development" branch.

I'm sorry but I wasn't able to find out how to modify text roles manually, for example, to change lyrics to chords?

webfischi commented 6 years ago

Me neither, but it's looking much better now. There should be also a hotkey to deassign and to select more than one textual item and connect or deconnect them if the text is recognized wrong. There is also problem with different font sizes within the lyrics

hbitteur commented 6 years ago

@maximumspatium Changing role is available at sentence level (a sentence is a list of words). The user cannot change the role of just a word even if it's the only word of the sentence, so make sure that the sentence rather than the word is selected (use a lasso) Also, lyrics are so different from other text roles, that you cannot just change role from/to lyrics, you have to remove and recreate the text.

hbitteur commented 6 years ago

@webfischi The hot key exists, it's the "DELETE" key (sorry the documentation is being written, not available yet). And you can select a bunch of items (inters or glyphs) with a lasso and/or augment your selection with left mouse button + CTRL key (on Windows/Linux. OPTION key on MacOS?)

Font size for the lyrics: What do you suggest? To use the same font (determined at sentence level) for all the lyric items (words) in the same lyric line (sentence)? And also align words baseline?

hbitteur commented 6 years ago

To all: The user interface is a brand new feature, available only in the "development" branch, not really documented yet, and not publicly released, this will be the official 5.1. Don't hesitate to post remarks, suggestions or improvement requests on this issue list now! Thanks in advance

webfischi commented 6 years ago

@hbitteur Yes exactly, yesterday i tested a document with a gap between the 3rd and 4th verse and the OCR got confused, verse 4 and 5 were printed bold and a lot bigger. The lasso seems still a bit impresice: When I select a line and don't hit the text a 100% it won't be deassigned, if I select a block of text only the text I crossed mith my mouse cursor will be deassigned instead of all textual items with the box.

hbitteur commented 6 years ago

@webfischi I think this is normal behavior: only the text selected is concerned, not the other items. Also mind the fact that, to explicitly select items with a lasso, the items must be fully embraced by the lasso, a simple non-void intersection is not enough. Or, you can point and click with the CTRL key down, but again only the selected items are selected. By definition.

webfischi commented 6 years ago

That would make sense, CTRL normally only select this way, is it possible to integrate a shift or ctrl + shift function to select all textual items within the selected area, when the first selected item was text?

hbitteur commented 6 years ago

If I understand your proposal correctly, you suggest that (with proper modifiers, such as CTRL, etc), the lasso selection would be defined as all intersected items, rather than all contained items.

OK, why not? I would like to make sure than this selection policy is a rather standard one, so that users don't get surprised. Do you have examples of it in other programs? Today, you can always select by a lasso most of the items you want, then incrementally augment your selection by picking up the missing ones via "CTRL + left click. Have you tried it?

Regarding the modifiers, "CTRL + SHIFT + lasso rectangle" is already assigned: It allows to zoom in so that the selected rectangle occupies the whole window. We would have to use another combination for your "selection by intersection".

webfischi commented 6 years ago

When there are more words recognized wrong, which happens a lot on low quality images, and there are more than 3 verses within 20 bars, it's too time consuming to select everything with CTRL. Windows Explorer is using Shift to select everything as well as office programs. Normally you lasso with the mouse to select folders and files in Windows Explorer, but you can also select the first item of the list, and Shift + Left Click on the last item of the list you want to select. In this case we just make it possible to select all items with the lasso holding Shift.

hbitteur commented 6 years ago

Yes, this is a way to select a sub-list within a list, which makes sense when the sub-list is obvious (such as within the list of files presented by Windows Explorer, or the list of lines and words as in Office Writer, where there is a clear sequence order).

The question is what is the sequence order for us in Audiveris when a bunch of items have been selected through say a lasso? Which one is the first, which one is the last, which ones are the intermediate ones?

In a music sheet, we deal with items scattered in 2 dimensions. We don't have the notion of lines as in Office Writer, except when we deal with lyrics lines (which requires that these lines have been recognized, but then it's a chicken and egg situation).

webfischi commented 6 years ago

It could either be the first item in range, if the first item was a piece of text only text will be selected within the lasso and if it was a note head, all that will be selected are note heads, or the first item has to be selected manually and then the same items can be selected.