Measure voices and time slots (including implicit tuplets)

hbitteur commented 5 years ago

I'm opening a separate issue to allow the correct processing of a measure even if it contains "implicit" tuplets. This topic appears from time to time, see #268 (near the end of the various discussions). Some typical examples are Beethoven's Mondscheinsonate or Schubert's Ave Maria.

Since I'm currently redesigning the retrieval of measure voices and time slots within the RHYTHMS step, I would like to share information about known difficulties and potential solutions.

First, the engine assumes (with no certainty) that:

Time signature (such as 4/4) is known. Perhaps the signature symbol exists in the current sheet and was correctly recognized, perhaps signature was inferred from other measures.
All relevant symbols (notes and FRATs=flag/rest/augDot/tuplets) have been correctly recognized, together with beam groups and ties.

The engine will try to build the measure rhythm information out of this data, and flag the measure as "abnormal" (displayed in pink) if it can detect any inconsistent data, so that the user can manually fix it.

Second, if the tuplets are "implicit" (needed but not noted graphically), the measure data will appear as inconsistent. Under some conditions, we can make the engine try to infer the missing tuplets. To avoid collateral damages, we have to be very careful. Here are the conditions we should require to be all met before attempting tuplet inference:

The user has enabled the dedicated option "implicit tuplet" (disabled by default)
There is no explicit tuplet in the measure
The standard voice/slots algorithm has failed
There is at least one voice, starting from first slot, whose duration is exactly 3/2 of measure expected duration

Then this voice will be tentatively assigned a tuplet ratio of 2/3 and the voice/slots algorithm will be retried on the measure. If no inconsistency is detected, then the implicit tuplet will be confirmed for the voice (and exported accordingly to MusicXML). Otherwise the measure will remain in abnormal status.

Thanks in advance for your feedback.

maximumspatium commented 5 years ago

Hervé, thanks a lot for working on this complex problem and for opening a separate issue to share your design decisions/progress/implementations.

As far as I know, there is no serious research regarding rhythm recognition in the OMR domain so far (I'd like to be proven wrong!) - the existing OMR systems implement their own ad hoc methods with varying degrees of success. It's maybe worth a try to set up a recurrent neural network (or smth similar suited for time-series analysis) and to run it on a couple of scores to infer hidden patterns...

Before we dig into the details, I want to stress the need of a better voice handling in multi-voice settings (that is, in the case a staff contains several voices). The problem usually arises when ad hoc multi-voicing is used: The above pattern (on-the-fly multi-voicing) is quite common in the keyboard music. Our algorithm need to make robust decisions which notes belong to which voice. Otherwise, we won't be able to verify rhythmical consistency of such a measure.

Now comes a couple of related questions:

isn't clear to me. What is the purpose of this rule? What is if a measure contains both explicit and implicit tuplets?
could you summarize that "standard voice/slots algorithm" so we can analyze its strengths/weaknesses and eventually propose improvements to it?

hbitteur commented 5 years ago

@maximumspatium In the RHYTHMS step, the purpose of the voice/slots algorithm is to find out, within any given stack of measures, the proper time for each chord (in fact the proper time offset since the beginning of the measure). The duration of each chord is known (assuming type, dots, tuplets, etc are OK) but at which offset does it begin, that's the question.

We have two kinds of information to help solve this:

The voice the chord belongs to, because the chord starting time is the end of the previous chord in voice,
The vertical stack time slot the chord is aligned with.

As usual, these informations must be taken with a grain of salt:

How do we gather chords in voices?
How do we gather chords in slots?

The voices/slots algorithm in 5.1 version is fragile. It incrementally computes the time offset of the next time slot based on the earliest end time of the current active (non-terminated) chords. This can lead to wrong time values. You could read the code especially in SlotsBuilder class but don't waste your time, the code is already well out-of-date.

I'm redesigning the classes in rhythm package, but I'm in the middle of it, I can't really share any stable code yet. I use a notion of VoiceDistance to map candidate chords to candidate voices, and I'm confident that the precise example you gave of ad hoc multi-voicing will be OK. Then, since voice mapping is getting rather reliable, we can use it in par with slot mapping. I'll get back to this later with concrete examples.

Beside this "standard" algorithm, I'm experimenting heuristics to cope with scores which are correctly recognized but require implicit tuplets. The purpose of the rule is to allow the user to easily process such "correct" input scores without the need to manually drag n' drop a tuplet to each location where it is meant to be, because this can represent a lot (4 6-tuplets for each measure in Schubert Ave Maria in all systems...). And it is beginning to work: in Ave Maria, where the first measure contains explicit 6-tuplets and no tuplets on the following measures, the different measures are correctly processed.

I think it's reasonable to exclude the coexistence of explicit and implicit tuplets in the same measure. The reason is that often all the needed tuplets are explicitly noted in the first (introductory) measure, letting the user replicate logically the same pattern to the following measures. Therefore, if we find an explicit tuplet within a given measure, we can infer that the measure does not assume implicit tuplets.

hbitteur commented 5 years ago

I just renamed the issue as "Measure voices and time slots", because the discussion is not limited to implicit tuplets. If we have a solid approach for voices and time slots, then support for implicit tuplets gets easy.

hbitteur commented 5 years ago

I think the major flaw in current algorithm is its "slot-first" approach.

A slot is like a vertical line that embraces the whole system height, and gathers all chords which are played at the same time (the slot time offset). Although this is correct in theory, the notion of graphical vertical line is difficult to deal with in practice:

If the whole image is composed of a single giant system (like an orchestra score as in ActorPreludeSample) then the system height is so big that any defect in printed image will impact chords strict vertical alignment. So this may work with perfect synthetic images but not for all printed scores. To somehow alleviate potential image distortion (such as skew or warping), the current algorithm does not deal with chords absolute abscissa within the whole image but with chords abscissa offset since measure starting barline within the same staff.
How do we determine a chord abscissa? It's the center of the rectangular bounding box of the stem if any and all member notes (heads or rest). Then, in the computation of slot abscissa, we may have to include the boxes of "adjacent" chords like these or those Then the "width" of a slot can get pretty large which can go against the separation of slots in a crowded score.

Also, the current algorithm processes one slot after the other, and determines the slot time offset based on the earliest end of the still active chords of the previous slot(s). "Still active" means not yet finished. Things are generally OK but here is a counter-example: The engine has correctly detected 2 slots, one with the 3 chords on the left and one with the single chord on the right. Then, it has detected that the earliest end came from the 1/8th note, and simply added 1/8 to left slot time to determine right slot time. Bingo! In fact, we should consider that the 1/8th note is an isolated on-the-fly voice that finishes before the next slot begins.

If we were able to relie on a strong voice detection algorithm, then a "voice-first" approach would avoid such mistakes. In the example above, the two chords with upward stem belong to the same voice, different from the voice of the 1/8th chord, and thus these two quarters are separated by 1/4.

hbitteur commented 5 years ago

(Oops, I pressed the wrong button)

oleedd commented 5 years ago

There should be used clef comparison. Because time signature goes parallelly. For example: Time slots are usually grouped. If notes in a slot are not matched with time signature then there are tuplets. Also triols are in groups of 3 connected notes, sixlets are in groups of 6 connected notes. So it is easy to find them. Just need to check if they really are in a score using time signature and clef (or voice) comparison. Also tuplet signs are skipped if there are very many tuplets. So if only 1 or 2 - there will be corresponding signs. So generally no need to recognize single implicit tuplets.

hbitteur commented 5 years ago

@oleedd The devil is in the details... A triplet is often but not always a sequence of 3 connected notes, and the same applies for sixlets. We are not sure of the number of notes (their types may vary), nor of the group connection (we can have rest aside beamed heads).

So for the time being, the approach I'm working on is a bit conservative: either a measure contains one or several explicit tuplets or one or several full voices in measure assume implicit tuplets. In other words, let's ignore the possibility of local implicit tuplets. I think that makes sense.

oleedd commented 5 years ago

It is need to support only a sequence of 3/6 connected notes. Because: 1) it is the most common; 2) such groups will have a tuplet sign for sure because in other case a musician will be strongly confused; 3) OMR software can't work with all situations but should support the most common situations.

But anyway a tuplet is almost always a measure (the beginning and the end are match up). In other cases there will be tuplet signs.

hbitteur commented 5 years ago

I agree, this is the reason why the target for "implicit" tuplets is the whole measure

hbitteur commented 5 years ago

@oleedd You wrote "There should be used clef comparison" What does the clef has to do with the tuplets? Or perhaps, you meant "different staves", regardless of the related clef?

oleedd commented 5 years ago

I mean measure comparison in different clefs (like in piano - the treble and basso clefs). Measures in different clefs should be matched up (measures go parallelly in multi-clef scores).

hbitteur commented 5 years ago

So yes, we don't really care about the actual clefs, but all the staves in the same system share the same time slots.

oleedd commented 5 years ago

Also there may be cases with parallel triplet and sixlet:

hbitteur commented 5 years ago

After multiple days of trials, errors and despair I have come to the conclusion that this is yet another "chicken and egg" problem in OMR. Sigh!

In certain cases, time-driven mapping needs some voice information and voice-mapping needs some time information. To add upon this, measures with not-yet-detected implicit tuplets exhibit perfectly aligned chords with non-consistent time offsets, and this gets in the way of the mutual time and voice mappings.

What I have in mind today is to play with 2 kinds of slots during the building of a measure, narrow measure slots perhaps gathered in wider compound measure slots. Wide and narrow refer to abscissa offsets:

The wide slot is used for voice linking: there is no guarantee that all chords in such slot share the same time offset, but they cannot be in sequence, meaning two chords in this slot cannot belong to the same voice.
The narrow slot is used for time information: we can consider that all its chords are so well graphically aligned that they share the same time offset value. Two narrow slots that are members of the same wide slot, may or may not have the same time offset value.

We can thus progressively, wide slot by wide slot, build the measure mapping.

This is done on the assumption that all time durations are correct, which is not the case for implicit tuplets which have not been detected yet. For such cases, the "implicit tuplet" detection will occur at the end of measure mapping when we discover that a voice duration represents 3/2 times the measure expected value and that many other voices get extinct because their following chord lies, timewise, too far away. Then, all chords that belong to such long voices will be applied an implicit tuplet, and the measure building will be re-launched with this updated data.

In the end, all these temporary narrow / wide measure slots are discarded and the final sequence of stack slots is built by dispatching chords by time offset value.

If this strategy is OK, the rest is just an easy matter of coding... Ouf! :-)

hbitteur commented 5 years ago

At last, 875d68e995218f2460f3d450f587d4d249788b75 commit on "development" branch provides the new implementation of RHYTHMS step.

It provides:

New algorithm based on narrow slots gathered in wider slots as explained above
New evaluations of voice distance between two possible chords, one for standard layout (clearly separated staves) and a different one for luth-like layout (merged grand staff). More explanation on the latter later.
Support for implicit tuplets, according to a new user processing switch (accessible in Book Parameters menu).

Please try this new version on scores of yours and don't hesitate to post your feedback. This is just an on-going development, I have a bunch of related features to work upon, but please report found bugs if any.

oleedd commented 5 years ago

How to try? I don't see this new 5.2.0-alpha version here: https://github.com/Audiveris/audiveris/releases

hbitteur commented 5 years ago

@oleedd It is not released yet (because it is still in development status), this is the reason why you can't see it in github releases.

It is however available in the development branch. To access it, you have to pull the software and explicitly check out the development branch:

git clone https://github.com/Audiveris/audiveris.git someFolderName cd someFolderName git checkout development

and then build the software:

./gradlew clean build

oleedd commented 5 years ago

@hbitteur I have an error on the last step and I don't want to fix it. Please give the 32-bit exe file here in zip.

maximumspatium commented 5 years ago

I have an error on the last step and I don't want to fix it. Please give the 32-bit exe file here in zip.

We won't share executables here. Everything in this forum is about open-source software development so you'd better learn to build your software yourself. That's the price you pay for getting the software for free.

This guide describes in detail how to build Audiveris from source. Please take ten minutes and follow this guide step by step. Everything you need is a JDK (Java development kit) and the Git command tool. Installing them and run three simple terminal commands shouldn't be that hard.

You got an error because Gradle cannot find Java development kit (JDK). It's either missing so you should go ahead and install it or you need to add its path to the Windows PATH.

Happy hacking!

oleedd commented 5 years ago

I have "BUILD SUCCESSFUL": But after launching Audiveris.bat the next error: Error: Could not find or load main class Audiveris Seems that need to open another issue for this problem if it is hard to upload the exe (I already know how to build - your goal is reached).

hbitteur commented 5 years ago

You have to launch the application by something like:

gradlew.bat run

From the same folder where you built the application (via command: gradlew.bat build)

oleedd commented 5 years ago

How to make it smaller like the standard release - without the big .git folder and gradle?

oleedd commented 5 years ago

Only small quantity of measures with implicit sixlets was recognized, generally no implicit triplets. Pdf: http://library.newmusicusa.org/files/1090/11.pdf

hbitteur commented 5 years ago

Right now, the algorithm for implicit tuplets detection is based on full measure voices that last 3/2 times the expected measure duration. Here is an example from SchbAvMaSample (available in Audiveris examples folder). The "6" displayed in half-opacity are the implicit tuplets automatically inserted:

The detection can't work on the following measure (taken out of your Chopin example 11.pdf) because the voice changes at chord 5276 (near the red cross): The voice change happened because, among the new comers 5277 and 5276, the chord 5275 was found "closer" voicewise to 5277 than to 5276 (opposite stem direction). We could (and will) provide a means for the user to manually force voice continuation between 5275 and 5276, but for the time being the OMR engine is left alone to decide...

Your Chopin example is a nightmare but, if we succeed in processing it correctly with a refined algorithm, this will be a good sign for the algorithm. So let's try to understand what is happening and modify the algorithm accordingly.

Here below the measure is flagged as abnormal (and displayed in pink) because the first two chords 5262 and 5203 on lower part of staff were considered as belonging to the same voice. I'm afraid this is due to the "voice-first" approach of the new algorithm (the old one was "time-slot first", and would probably have avoided this mistake):

The following one shows the voice in green was a "full measure" voice, correctly processed with implicit tuplets. However, the measure is flagged as abnormal because of a time inconsistency at chords 5404 and 5402 (they are located middle of the measure, just above the pedal up mark, where the red cross is):

I need some help to understand the situation. We have two beam groups that started at time 1/4 (with chords 5397 and 5395 respectively).

The lower one (green voice with tuplet) is composed of 6 chords, each of 1/16 * 2/3 = 1/24. Its last chord (5402) thus starts at 1/2 - 1/24 = 11/24
The upper one (blue voice without tuplet) is composed of 2 chords (first one is 5397= a dotted 1/8, duration 3/16 and second one 5404 of duration 1/16). Its last chord (5404) starts at 1/4 + 3/16 = 7/16

These two last chords should share the same time slot (enforced by the sharing of a note head), and their time offsets are different (7/16 vs 11/24). How come??? Please help me.

oleedd commented 5 years ago

Some musicians think that such Chopin rhythm (like 7/16 and 11/24) is almost the same in so big tempo (207 bmp for 4 notes), so acceptably to play as a first 8-th with two dots (we sacrifice half of a 16-th). There almost no difference but to play it as written is very hard even in two hands (like a very fast grace note effect). So in this case they really should share the same time slot. It is not theoretically correct but such cases may happen. Sharing of a note head and the same distance (a note under a note) should has the biggest priority in rhythm analysis. This score has a more abnormal situation. Does mxl support it (this score was typed in Finale music editor)?

hbitteur commented 5 years ago

Important update: Regarding the last example in Chopin score, I came to the conclusion that the score is wrong and Audiveris is right. It may sound pretentious, but this is not the case! Here is the simple explanation (that I should have found earlier).

We have 2 voices that arrive at time offset 1/2. Last chord for each voice is an eighth (2 beams), hence with a normal duration of 1/16. The difference is that the lower voice is impacted by a tuplet, and thus the duration of its last eighth chord is actually not 1/16 but 1/16 * 2/3 = 1/24 Hence, the time offset of these two last chords are: 1/2 - 1/16 for the upper one and 1/2 - 1/24 for the lower one. So they can't share the same time offset!

And the fact that these 2 chords (5404 and 5402) share a common note is a misleading mistake of the printed score itself.

hbitteur commented 5 years ago

Let's have a look at the new example you presented:

Notice that the last chord of the lower voice is abscissawise slightly before the last chord of the upper voice. It's the same configuration (7/16 vs 11/24), but here no note head is physically shared and, using narrow slots, the current algorithm would not impose a common time offset value.

oleedd commented 5 years ago

But it was the best choice for the composer because making a 8-th with two dots and a sixlet in the upper voice would make very big chaos in the score. It would be very difficult to understand where are and where no sixlets because now the implicit sixlets are only groups of six 16-th notes. Also a 4-th can't be used as a sixlet. And the Chopin manner was kept (he didn't use two dots in the original). But if it was possible to print in a note editor when it is possible to export after recognition. Notice that rarely there are strange things which aren't completely correct in scores. Implicit tuplets are also not correct. This case is also strange because the 8-th rests are missed: But it was still recognized and the above fragment too. Just need to export it in that look as it is depicted. But again - in such cases sharing of a same note head or a same distance (a note under a note) should have the biggest priority in rhythm analysis.

maximumspatium commented 5 years ago

We're actually discussing three distinct issues:

implicit tuplets
illegal rhythmic alignment in multi-part setting
additional parts to point the melodic line

Let's look for a solution for the last two issues first.

The 3rd case can be interpreted as two-parts-setting of the equal overall duration where one of the parts has hidden rests:

That's the original: counter-melodie

The same version completed with hidden rests:

separate stems with beams Musescore

The solution is therefore to unify the duration of both groups and to insert hidden rests.

Regarding the 2nd case, there are two variants:

a) separate note heads Illegal rhythmic alignment

b) shared note head Invalid rhythmic alignment shared note

Both variants are invalid because this notation cannot be played exactly as written. The proposed solution is to interpret the upper voice in "b" as a voice with hidden rest: Rhythmic unification

The spatial alignment of the variant "a" could be either corrected as follows: Corrected rhythmic alignment

or solved just like the variant "b" (i.e. by unifying group duration and inserting a ghost rest).

In both cases, some UI should be provided so the user can tune the timing of notes.

@hbitteur and @oleedd Does it makes sense to you?

maximumspatium commented 5 years ago

Just for curiosity, I've looked into the old printed version of the original Chopin etude from the end of the 19th century. These (obviously) handmade printing plates reveal a significant variation in spatial alignment of the same rhythmic pattern:

Chopin Etude tuplet v_align1 Chopin Etude tuplet v_align2 Chopin Etude tuplet v_align3

This is therefore not a question of composer's intention but simply an imprecision in old printing technique...

maximumspatium commented 5 years ago

The voice change happened because, among the new comers 5277 and 5276, the chord 5275 was found "closer" voicewise to 5277 than to 5276 (opposite stem direction).

This behavior is fully correct in the given context.

We could (and will) provide a means for the user to manually force voice continuation between 5275 and 5276, but for the time being the OMR engine is left alone to decide...

We could do but the correct way to indicate that 5276-5283 is the continuation of 5270-5275 is to reverse stem direction of the latter and to place a quarter rest above it.

Here below the measure is flagged as abnormal (and displayed in pink) because the first two chords 5262 and 5263 on lower part of staff were considered as belonging to the same voice. I'm afraid this is due to the "voice-first" approach of the new algorithm (the old one was "time-slot first", and would probably have avoided this mistake)

That's happens because the duration of each note wasn't taken into account. 5262 and 5263 (shown in the picture below as "A" and "B" respectively) cannot belong to the same voice because the former (A) is sounding when 5263 (B) is struck. Your algorithm need to add some flag to each voice/time position indicating which previous note is still active and which is cancelled. The right voice mapping fo that bar is as follows:

Chopin-Komanetsky correct voice assignment bar4

The group "C" is of a special interest because it contains a grace note attached to a chord. Because Audiveris currently mistakes this grace note with a quaver (or eight note), the new voice mapping algorithm delivers here incorrect results because the quarter note in the lower voice preceding "C" and the misrecognized grace note overlap in time.

Considering that grace-quaver-mistake, the correct voice mapping should look like that:

Copin-Komanetsky bar4 correct voice assignemnt w:o grace

The group "C" now belong to the 3rd voice. Why? Because

the quaver "b" is struck while the chord "C-G" preceding it is still sounding
the last chord shares the same voice with the quaver it's tied to

oleedd commented 5 years ago

These (obviously) handmade printing plates reveal a significant variation in spatial alignment of the same rhythmic pattern

There are different distances between 16-th notes by reason of accidentals. Also the first example with ascending beams has not good spatial alignment - almost opposite the last note. The next two examples almost the same - only several pixels difference.

The solution is therefore to unify the duration of both groups and to insert hidden rests.

It is for sure.

About the shared note head/separate note heads issue. The composer wanted to take all the time before the last 16-th (without rests). The easiest and the most beautiful way is really in such incorrect notation but completely understandable for musicians. But why Finale music editor allowed to type so and can play it if it is incorrect? But the sense of recognition is to recreate a score in the original look if even it is incorrect, especially if note editors support such incorrect look. The MusicXML format is bad for this.

hbitteur commented 5 years ago

This discussion is really interesting, if not passionate! Last night contributions came from @maximumspatium (1:30) and @oleedd (2:30), Paris local time. If you want to sleep at night, I hope you guys are not in Europe right now ... :-)

Let's focus on the original goal which is to come up with a working algorithm. In that respect, the "composer intentions" are clearly out of reach. So let's consider only the printed score and try to infer the correct rhythm, without resorting to score modifications apart from implicit symbols.

More precisely, assuming all the measure symbols have been correctly recognized by the OMR engine (or manually corrected by the user), I think that the rhythm algorithm:

May add missing symbols that have been omitted:
- Implicit tuplets
- Implicit rests
May not modify existing symbols:
- Change in the direction of a stem
- Separation of shared note heads
- Translation of chords

The objective is to make the algorithm work in most cases. However, there are gray areas where the algorithm may go into a wrong direction (voice mapping is one example of this, because the "rules" for voice continuation are not carved in stone). In these cases, we have to provide the user with some interactive means to guide the algorithm (for example by forcing a voice mapping).

hbitteur commented 5 years ago

Regarding the case: Chopin-Komanetsky correct voice assignment bar4

There is no need to work further, because there were 2 mistakes:

The OMR engine had mistaken a grace note for a standard note, and grace notes are not involved in rhythm algorithm (so the "C" chord is not to be considered)
The current version of new rhythm algorithm (the one you pulled) needs to get full-measure voices in order to detect the need for implicit tuplet. For this, when two chords of the same slot disagree about their time offset, and this is the case for the 2nd slot, the algorithm needs to pick up the larger time value. This is so, because the implicit tuplet mode needs to keep the longest voices unbroken (they are the candidates for implicit tuplets). I'm now trying to modify the algorithm, picking up the shorter time value as in 5.1 version, but detecting tuplet-candidates on the fly. We'll see if this can work...

oleedd commented 5 years ago

May not modify existing symbols:

Separation of shared note heads

I manually added all missed tuplets a while ago before this feature was created. And this incorrect rhythm was differently opened (MusicXML) in note editors:

in Sibelius all shared note heads were separated and doubled. But not as in the Chopin's original, but a note near a note without distances like this:
in Finale many fragments were in the real (incorrect) score look (so it is still possible!). But also many shared note heads were separated and doubled like in Sibelius.

maximumspatium commented 5 years ago

The easiest and the most beautiful way is really in such incorrect notation but completely understandable for musicians.

Yes, that's exactly the problem with the western notation because it assumes implicit human interpretation. Every musician learns to interpret notation, i.e. to map the imprecise graphics into a real timing.

Corrected rhythmic alignment

The problem with the above notation is that no human player, even a very virtouso one, can play that exact timing in a reasonable fast tempo. That's why the editor has "hacked" the notation by changing the precise graphical layout to something that better reflect the real human performance. That's okay but it still doesn't make the hacked notation legal :)

But why Finale music editor allowed to type so and can play it if it is incorrect?

Because humans want it (it looks nicer so). The trick is that Finale uses the precise timing needed for playback in the background while allowing you to change (or humanize) the graphical representation. In other words, Finale links two distinct interpretations together while making all mapping details invisible for you :)

But the sense of recognition is to recreate a score in the original look if even it is incorrect, especially if note editors support such incorrect look.

I disagree here. The purpose of the music recognition isn't to recreate some score layout in the first place. For that purpose, you can simple make a copy of your score by means of a copy machine or scanner. This way you'll be able to precisely recreate existing scores. The purpose of the OMR is to convert graphical representation of some music to a symbolic format allowing you to do a lot more than simple copying - it includes transposing, arranging, analyzing, reproducing and republishing music. Moreover, (at least here in Germany and in France) exact copying of copyrighted editions is prohibited by law...

The MusicXML format is bad for this.

It's not. You cannot design all-in-one format suitable for every purpose. MusicXML has been designed to meet several requirements including sounding reproduction of musical notation. It therefore exposes some constraints on musical timing. That means that our system has to infer exact timing needed for playback from imprecise graphical layout created for humans and that's the task we're trying to solve here. If you need to produce an exact graphical copy of a score, you can export the recognized picture into PDF. This way, you'll loose the ability to manipulate the recognized music...

oleedd commented 5 years ago

OK. If not to recreate but to make maximum close to the original input - the best is when the number of elements is the same. For me MusicXML is very bad because it broke many of fully "good" green fragments (but its rigid timing limits). Also I have read that it has a very harsh reaction to mistakes. One more big reason - note editors sometimes open one file as it is generally different files. Then de facto there are many ambiguities in this format. It is need something like .omr format (without rhythm requirements) but with a filter for not-supported symbols.

hbitteur commented 5 years ago

@oleedd Let me precise what I mean by:

    May not modify existing symbols:
       Separation of shared note heads

A "shared" note head is a graphical note head linked to a stem in the upper right corner and to another stem in the bottom left corner.

Audiveris recognizes this configuration and creates two separate heads (one for each stem if I can say so). But these two chords exhibit the same time offset, precisely because they originate from the same graphical head symbol.

And this time property is something very strong. It is thus forbidden for the algorithm to "cut" such relation, by separating further the 2 chords, leading to different time offsets.

It's true that the two initial chords are already "separate". Sorry if my initial sentence could be misunderstood.

oleedd commented 5 years ago

The problem with the above notation is that no human player, even a very virtuoso one, can play that exact timing in a reasonable fast tempo.

Possible to play if the last note heads are separated like in the above example - there are many peaces with such (correct) rhythm in one hand too in the transcription from another issue. If note heads are shared it would be more difficult to play - but the foundation of 6 equal 16-th notes will be broken and it will sound very bad. More of that I and probably the composer don't like this grace note effect if to play in the correct rhythm because it has the effect of a non-synchronous chord. Rhythm is not one of the biggest problems for musicians. Even combined triplets and 4-lets may be played.

About "separation of shared note heads" - linking to this I just wanted to say that note editors open such rhythm in different ways. Even in Finale some note heads was kept shared and some was doubled. And it is really strange. But now I am not sure anymore - are shared note heads supported for export?

maximumspatium commented 5 years ago

But now I am not sure anymore - are shared note heads supported for export?

To my knowledge, MusicXML has no concept of shared noteheads. It's merely a graphical interpretation used in some notation editors. Often there is a switchable option for that somewhere...

oleedd commented 5 years ago

It's merely a graphical interpretation

But it is important because no sense to double the same note heads - it looks bad and foolishly and not possible to double a sound in playing. And sometimes a note is really a part of both the melody and accompaniment voices.

oleedd commented 5 years ago

Why on the last page no one implicit tuplet was recognized? There is no incorrect rhythm. And even one explicit triplet wasn't recognized. Page.pdf

hbitteur commented 5 years ago

I have just processed this last page (message above) with a new algorithm version and the result looks good, with the exception of the very long sequence of beamed cue notes at the end of the page (such long cue beams are still not supported).

I will commit the new version in the coming minutes.

Here are the actions I did:

Make sure the implicit tuplets mode is ON for this sheet (see menu Book | set Book Parameters)
Manually insert a "C" common time signature at the very beginning of the sheet (otherwise the implicit tuplet algorithm has no knowledge of the expected measure duration). Do this at end of HEADERS step preferably, before RHYTHMS step is launched.
Then finish transcription.
The measure at beginning of system 3 appears in pink. Manually assign the explicit tuplet 3 which was not recognized. Manually insert a tuplet 6 for each of the 3 first beamed groups (implicits were not detected in this case, impeded by the configuration at the end of the measure).

That's all for now.

hbitteur commented 5 years ago

Latest version pushed as commit 37e2f83021e5858d7ea52f7b98a73ec250543d07

Please make sure that a time signature is present at the beginning of the sheet. This is mandatory for this version to work. In the real life, there is a time signature in the current sheet, or in some previous sheet. I know that.

oleedd commented 5 years ago

But I have the first 3 staffs without tuplets but the next two with tuplets. I made full transcribing of all the 7 pages. And here is an error - there is a normal rhythm without tuplets - grace notes shouldn't be counted: Also the 3-th and 4-th pages are generally without tuplets.

oleedd commented 5 years ago

Manually insert a "C" common time signature at the very beginning of the sheet (otherwise the implicit tuplet algorithm has no knowledge of the expected measure duration).

But it should remember the last found time signature. In our case - from the beginning of the score.

After adding 4/4 some measures with tuplets became recognized. But many measures with correct voicing and without the disputed incorrect rhythm (but many tuplets with that rhythm were recognized - so it should work for all of them) - for example: (the 3-th page) (the 5-th page) (the 4-th page) (№1 of the 7 page)

Why these measures are pink if there is all right? Also the arpeggiato sign is sometimes recognized as a bar line.

hbitteur commented 5 years ago

But it should remember the last found time signature. In our case - from the beginning of the score.

True. But this forces the engine to transcribe all sheets in order, sequentially.

In some cases, you may be interested in one sheet only. Or, you want the engine to process several sheets in parallel. In these cases, the notion of "last found time sig" is not applicable, unfortunately.

hbitteur commented 5 years ago

And here is an error - there is a normal rhythm without tuplets - grace notes shouldn't be counted

Mind the fact that these cue notes have been mistaken for standard notes, because of their actual size. And the slashed flags not recognized.

To make the RHYTHMS step work correctly (or to just reprocess a given measure, via a right click in this measure), you must first manually remove these false chords.

hbitteur commented 5 years ago

After adding 4/4 some measures with tuplets became recognized. But many measures with correct voicing and without the disputed incorrect rhythm (but many tuplets with that rhythm were recognized - so it should work for all of them)

The new algorithm is able to infer implicit tuplets in some cases. Code is not obvious and is still being worked upon.

Typically we need a time reference to 1/ detect a timing inconsistency, and 2/ to check the ratio of difference with the reference. If this ratio is 3/2, then we can inject implicit tuplets starting from the last common synchronization slot and then retry the measure processing.

Right now, we can use two sources of timing reference:

Two voices in parallel that disagree at a given time time slot. The reference is the shorter voice. In some cases, the shorter voice would also need implicit tuplets (this happens in the Chopin example) and therefore some manual tuplet injection is needed to get a reliable reference.
One voice that occupies the whole measure, but lasts longer than the measure expected duration. The reference is the expected measure duration.

We could investigate other cases. For example, for a voice that ends after the expected measure end, we could inspect the voice sequence of chords and somehow detect where one or several implicit tuplets, applied on some chords of the voice, would correctly adjust the voice end time. A kind of back-propagation if you wish :-)

This may appear in some future version of the algorithm. This algorithm redesign has taken much longer than I thought initially. The version just committed (development branch) works is most cases. Let's keep it for a while and evaluate it in various configurations.

oleedd commented 5 years ago

the notion of "last found time sig" is not applicable

It is need to find the close time signature in the document of a working sheet (before it).

and somehow detect where one or several implicit tuplets

That is a problem because need to know how. Need an exact algorithm but not approximate.

Let's see how a musician finds implicit tuplets. Need to get close to it - and imitation of the human mind is the purpose of AI. 1) A musician sees groups of notes (they are separated). He know how many groups should be in a time signature (the lower number). 2) He counts beamed notes (sometimes the first is a rest) in each group and knows how many of them should be normally. 3) If there are 3/6 where should be 2/4 - tuplets are found.

Note: it is not even necessary to compare voices. It is much easier. For example, if the algorithm sees 4 groups (they are separated) at the "C" signature. Then need to count short beamed notes in each group. If 3 or 6 - it is a tuplet. That is all. It is better to fix excessive adding tuplets than insufficient. The problem is that your algorithm breaks in measures with non-standard voicing. Sorry but I don't believe that it is hard to implement with programming for about 95% of all cases. Without grouping it is much harder. Maybe you chose some difficult unreliable ways. Note: implicit tuplets are impossible without grouping. Also please answer why are the above examples red, using numbers of pictures. Interesting. I can't get it.

Audiveris / audiveris

Measure voices and time slots (including implicit tuplets) #279