Yikai-Liao / symusic

A cross platform note level midi decoding library with lightening speed, based on minimidi.
https://yikai-liao.github.io/symusic/
MIT License
108 stars 8 forks source link

Feature for `ScoreTick`: resample the time division `ticks_per_quarter` #10

Open Natooz opened 6 months ago

Natooz commented 6 months ago

Each MIDI (and in turn ScoreTick) has a time division express in ticks per beat, which often ranges between 120 and 480 (multiples of 2, 3 and 4).

It could be useful for some users to provide a feature allowing to resample a ScoreTick by changing the time signature. This would imply resample the times of all the events of a MIDI, and durations of events concerned. For the durations especially, a min_duration expressed in tick could be very useful for cases where the resampling reduces the time division and turns some durations to 0. You can take a look at how it is done in miditok (quantize_* methods), though there are probably better ways to do it, I was thinking of using numpy to batch the computations.

That's not an important feature, this can wait or be refused without major justification.

Yikai-Liao commented 6 months ago

do you mean “by changing ticks per beat(or quarter)”? time signature is not relelated to time unit.

Natooz commented 6 months ago

From my understanding, the time division is expressed in ticks per beat, meaning that not matter what the current time signature is, one beat will always be the same number of ticks for a given MIDI. Since the beginning, I assumed the .ticks_per_quarter attribute was actually 'ticks per beat", which is the case as it aligns exactly with the miditoolkit time values.

lzqlzzq commented 6 months ago

Here is the original Standard MIDI File Specification you can download from SMF_Spec. Actually the division is defining quarter not beat according to the specification.

Natooz commented 6 months ago

Thank you for the clarification! I am a bit embarrassed to have confused this with beat the whole time now. So indeed the number of ticks per beat will depend on the current time signature denominator. Now that doesn't affect a feature to "resample" a MIDI, only if a we provide a minimum duration argument it would have to be expressed in ticks per quarter too.

Natooz commented 6 months ago

Ideally, this could be implemented in a .resample(new_tpq) method, while .ticks_per_quarter could be set as a property with a setter calling .resample.

Yikai-Liao commented 6 months ago

I don't quite understand what's the minimum duration for.

Natooz commented 6 months ago

Let's say I have a MIDI with a time division of 12 ticks per quarter, with a single note with an onset time at tick 3 and offset at tick 4 (one tick duration). If I resample this MIDI to 4 ticks per quarter, the "resampling factor" will be of 1/3. So the onset tick of the note will be rounded to tick 1, and the offset too, leaving a note with no duration. A "min_duration" feature would allow to prevent such cases by enforcing this minimum value to note durations below this threshold after the process. This represents edge cases of extremely short notes, but this can still happen. In fact there is a similar fail-safe check in MidiTok.

Yikai-Liao commented 6 months ago

For this kind of situation, I think we still need to throw a warning for those extremely short notes. But why this is a valid operation? It causes lots of information loss, which is much higher than the float precession loss problem in tempo.

Yikai-Liao commented 6 months ago

Maybe there should be a function,that could quantize start and end positions of notes to a standard grid( or a grid given by users). Of course, in this kind of functions, we should define a minimum duration length.

In contrast to the previously mentioned tpq scaling interface, this interface makes it clear that it will change the actual information stored in the score. Of course after calling this interface, the common factor of all timestamps is made clear, and we can then scale on top of that with no loss of information. However, the exact parameters of this interface I conceived are still up for discussion.

Natooz commented 6 months ago

I don't know if throwing a warning is necessary, at least if the user has the intention to automatically set them to a minimum value automatically.

Indeed by aggressively downsampling we lose a lot of information. I bring this use-case as it is actually a required step in MidiTok, that is necessary in order to align the time and values of the events for them to be tokenized/discretized.

The "grid" approach is what is performed by MidiTok for versions <= 2.1.8, leaving the time division untouched. I am however not sure to 100% understand what you mean by "we can then scale on top of that with no loss of information". Aligning times like using a grid or by completely resampling the tpq time division would end up to the same information loss in the end.

I actually opened this discussion as I intended to change this process in MidiTok, by completely resampling the MIDI and changing its time division, in particular by batching the computations with numpy. It allows to speed up the preprocessing (I hope, still have to measure actually) and simplify a considerable portion of downstream code. I just implemented it in https://github.com/Natooz/MidiTok/pull/116 The "quantize" protected methods were just renamed "resample" to be more idiomatic. Anyway, I thought this could be a feature that could be directly implemented within symusic, as it only impacts the MIDI object itself, and there might be future users that have an interest in this, by let's say resampling a whole dataset maybe?

Yikai-Liao commented 6 months ago

Okay, I understand the need.

But we still need a pythonic interface. And I have some good ideas.

The reason is that in symusic, We support three kinds of time unit, And only tick can be resampled ( the other two are stored as float). So if we define a resample method for all the score, the min duration will be just ignored ( change tpq directly, Without changing other data).

On the other hand, convert a score from quarter or second to tick will need min duration, while the other conversion won't have this parameter.

I feel like this is awkward. Or we should just add a check,to make sure that None is passed as the parameter's value, When it is not needed.

Yikai-Liao commented 6 months ago

Okay, I understand the need.

But we still need a pythonic interface. And I dont't have some good ideas.

The reason is that in symusic, We support three kinds of time unit, And only tick can be resampled ( the other two are stored as float). So if we define a resample method for all the score, the min duration will be just ignored ( change tpq directly, Without changing other data).

On the other hand, convert a score from quarter or second to tick will need min duration, while the other conversion won't have this parameter.

I feel like this is awkward. Or we should just add a check,to make sure that None is passed as the parameter's value, When it is not needed.

Well, maybe the resample method should always return a score in tick?

Yikai-Liao commented 6 months ago

14 resample have been added, although I'm not sure would it work as what you expected.

Natooz commented 6 months ago

This is great, thank you for your reactivity!

What do you mean by that? I'm currently testing it, and it seems to do exactly what it is intended to do (resampling time values).

Yikai-Liao commented 6 months ago

This is great, thank you for your reactivity!

What do you mean by that? I'm currently testing it, and it seems to do exactly what it is intended to do (resampling time values).

I simply round (time * tpq) and (duration * tpq) and this may lead to a different result compared to other resampling methods.

for example, if start*tpq=1.4 and duration*tpq=1.4

after the round operation, we will get start=1 and duration=1, which means end=2

However, if we round end*tpq directly, we will get: round(end * tpq)=round(2.8)=3

I'm not sure which one is better.

By the way, I will release this version after dealing with this problem.

Natooz commented 6 months ago

Ok I see. In this case, rounding the end value, then deducing the new duration value might be a better solution, as it allows more accurate approximations (as ‘end‘ will always be larger).

Yikai-Liao commented 6 months ago

Ok I see. In this case, rounding the end value, then deducing the new duration value might be a better solution, as it allows more accurate approximations (as ‘end‘ will always be larger).

Not really, end is not always tend to be larger. Similar to the former example, if we got start*tpq=1.6 and duration*tpq=1.6:

Well, considering that tqp tend to be large (normally greater than 100), such ambiguity might not lead to some essential issue. Only if tpq is set to some small numbers, like 24, can it make a huge difference.

Natooz commented 6 months ago

Indeed, but in the end in your example the "closest" end value to the "original" one is still 3 (rounded from 3.2). Most of the time shift / info loss is on the start tick. In the end, if we sum the time shifts from the original onset and offset positions, we have 0.4+0.8 for rounding on duration and 0.4+0.2 for rounding on the end. However the absolute final duration difference is indeed larger when rounding on the end value. So I guess the question is should we prioritize minimizing the difference on the before-after duration, or maximizing a F1-like score between the before-after onset and offset positions.

tpq down to 8 is common with MidiTok, as we need to "agressively" discretize time values.

A figure showing the issue, top is original note, middle after resampling to 1/4 beat by rounding the end/offset, bottom by rounding the duration value.

Capture d’écran 2023-12-25 à 08 22 25
Yikai-Liao commented 6 months ago

I find a paper about logic pro's advanced quantization strategy.

I'll dive into it afterward. I think it's time to release the naive quantization version first.

Natooz commented 6 months ago

👋

I'm adding a feature request to this PR as it relates to resampling. It would be to allow to resample a ScoreTick based on a ticks/beat value instead of a ticks/quarter.

The idea would be to resample the MIDI contents based on a fixed ticks/beat value, making the effective ticks/quarter resampling factor value evolve through time if there are time signature denominator changes. I encountered this use-case in https://github.com/Natooz/MidiTok/pull/124 where I'm trying to take the time signatures into account when determining the time tokens, that are expressed in beats (and not just quarter as previously). However if we resample a whole MIDI with the same tpq value, we might end up with too accurate positions when the time signature denominator is lower than 4 (i.e. longer beat so higher ticks/beat), and loose accuracy when the denominator is higher than 4 (shorter beat so lower ticks/beat).

Here is an example of what it would look like with a ticks_per_beat of 8:

Capture d’écran 2024-01-05 à 12 16 54 Capture d’écran 2024-01-05 à 12 17 20

Top is original, bottom resampled at 8 ticks per beat. The first time signature is 4/2, then 4/4, then 4/8. Red arrows show the beat duration for each time signature. On the first section, we loose a lot of time information / precision, which is expected. I created the example just to show but in reality such cases are not very common, and in practice we could have increase the ticks/beat value. On the last section (4/8 time signature) however, we keep most of the time accuracy, that we would have lost by resampling in ticks/quarter, the notes would look like the ones in the 4/4 section.

I didn't look in the symusic code handling the resampling, but I will shortly. I don't think this would bring significant changes (but might be wrong), as it simply updates the effective tpq resampling factor value according to the current time signature denominator. In practice, the ScoreTick's new time division ticks_per_quarter could be set by the user (tpq argument, as long as it is compatible with a tpb value) or automatically set in function of the highest time signature denominator in the MIDI :

# lowest beat duration
denom_max = max(ts.denominator for ts in midi.time_signatures)
quarter_factor = denom_max / 4  # can be < 1 if only */x time sigs with x < 4
new_tpq = int(tpb * quarter_factor)

I'm making the suggestion here, as doing it in Python would kill add a lot of processing time and would kill the interest of using resample. I know the feature might be only applicable to MidiTok, so I'll try to look into the code and see if I can help if you're ok with the feature, to not "abuse" of your time.

Yikai-Liao commented 6 months ago

This example reminds me a little bit that low-resolution "scores" should be treated specially (Sounds a lot like physics 😉). Different from normal midi files with high resolution, where we don't need to take much care about time information loss, in this case we always need to struggle with all kinds of information loss. And that's why I say we need to treat them sepcially.

Also, the key difference between high resolution and low resolution score is that:

Grids might be a useful concept (probably not in terms of specific implementations). We need to attach high-precision datas onto a variety of different grids depending on the design of the downstream represenation (which may be very flexible, because the essence of scientific research is to arrange and combine all situations🤣).

I'm currently thinking that an ideal interface would allow the user to customize some grids in high precision space, and then have the midi attach to them efficiently 🤔. All this behavior is supposed to happen in high-precision space, as a simulation of some kind of low-precision space designed by the user themselves. (Because for these customized representations, flexibility and extensibility are the most important, and performance should be secondary. It's just like we prefer to coding on cpu, but not on fpga, on python but not on c++🤷‍♂️)

It can then be quantized into the low-precision space in some more flexible ways. For example, If this grid is not equally spaced in high precision space, when quantizing, it is possible that the user will want to make them equally spaced, or they may want to keep the original distance ratio. I remember seeing these in various pianoroll grid designs.

Of course, all of the above is just an abstraction of similar needs. I'm not sure they're feasible, for both users and our implementation.

Natooz commented 6 months ago

Indeed I can't think of a more flexible way than to let the user provide a list of "reference" ticks to stick the message times onto. But as you said the performance would be impacted, as I assume for each time value to resample, we would need to perform an argmin operation (or similar) to find the closest tick.

And also, this might be an "overkill" interface for users that simply want to resample the time division (with ticks/beat or not). So if we were to implement it, it could be best to do it in a separate method?

Yikai-Liao commented 6 months ago

Well yes. Besides those flexible general cases, we need to offer effecient implementation for simple cases.

I'd like to point out that, from an architecture design perspective, we can't write a special interface just for one situation after only considering it. I would like to make this interface similar to the interface for the generic case mentioned above, which would be more elegant and natural.

More over, users also have a more low-level, and more flexible option. They could use the coming NoteArr, get the 4 1D np array (time, duration, pitch, velocity), and write their customized quantization methods using numba (fast because of jit)

Natooz commented 6 months ago

Noted. So what is the direction you would like to take from now? Just implementing the "grid" method for to cover all the cases users could have? Or also implementing the ticks/beat approach in the existing resample method?

Yikai-Liao commented 6 months ago

No, as I said, it's just kind of conceptions for better understanding. And there is no ready-made case study for the design of this interface.

We are just wading across the stream by feeling the way.

Maybe we could implement some interfaces to explore first. And write in comments or (currently non-existent) documentation that these interfaces are not yet stable.

Or we could put them in symusic.experimental, just like what numba did for it's jit class (But this feature is experimental in numba for years, I hope we won't get into this kind of situation).

Yikai-Liao commented 6 months ago

After discussing with @lzqlzzq , I think currently, the best and at least easiest way is NoteArr. Just let user write their own mapping function using numpy and numba. It's flexible without being inefficient.

And also users are able to construct NoteArray from numpy arrays and convert them to Track and then Score.

This process is basically working, the question that still needs to be considered is whether there is a need to handle events other than Note, such as Pedal

We can also add some numpy implementation of mapping functions to the symusic python side of the code. This would be more conducive for the community to create some pr to implement new functions.

Of course, if a function is widely used, it is easier to further implement a more efficient version in C++ (since the interface and semantics of the function are already stabilized).

Natooz commented 6 months ago

Ok!

the question that still needs to be considered is whether there is a need to handle events other than Note, such as Pedal

Are you referring to the NoteArray feature, or the resampling with a "grid"? Btw we should maybe find a name to this resampling method. What about naming the "grid" layout argument something like "reference_ticks", that explicitly describe what the array would contain?

Natooz commented 6 months ago

I'll leave https://github.com/Natooz/MidiTok/pull/124 open until this is implemented. Tell me if I can help in any way. This PR contains the last things I wanted to fix before releasing the v3. I'll also work on the docs before the releasing the v3, but other than that everything is done!

Yikai-Liao commented 6 months ago

The process I'm envisioning right now is: midi -> Score -> list[NoteArr] (get all the np arrays) -> np arrays (from all the cumstomized conversions) -> back to NoteArr -> Score -> midi The key point is that, we use struct of array (like NoteArr) to fit these needs, instead of array of struct (just like Score and Track). In this way, can we utilize numpy, numba in python to write cumstomized conversions.

the question that still needs to be considered is whether there is a need to handle events other than Note, such as Pedal

So, here I mean that, do we also need to convert controls, pedals, markers, lyrics and so on to struct of array from the original array of struct. If you need to convert notes based on these information, I think their struct of array representation is needed.

Oh, at least you need the struct of array for time signatures and tempos.

Natooz commented 6 months ago

Indeed, if we provide such feature for notes, it would be good to provide it for other types of messages/events too. If a user is to use it for notes, it is very likely to use it for tempos, time signatures and others too.

So it would finally not be an "end-to-end" resampling method? By that I mean that after letting the user do its operations on the np arrays, the time division ticks_per_quarter of the Score object stays the same, so the shapes of the arrays after modifications are expected to still be considered at the same time division?

I took a look at the resampling method, and after some reflexion I think that implementing a tpb (ticks/beat) argument would not be too difficult, and mostly be much easier to use for users than asking them to compute reference ticks (+ more efficient). I wrote the logic in Python so you can see how it would insert in the current method:

The .resample method, with an additional optional tpb argument.

def resample_midi(
    midi: Score,
    tpq: int | None = None,
    tpb: int | None = None,
) -> None:
    """Current `resample` method.

    :param midi:
    :param tpq: new ticks per quarter value
    :param tpb: ticks/beat value.
    """
    # We make sure the user gave at least one of the two arguments
    if tpq is None:
        if tpb is None:
            raise ValueError("You must provide either a `tpq` or `tpb` argument.")
        denom_max = max(ts.denominator for ts in midi.time_signatures)
        quarter_factor = denom_max / 4  # can be < 1 if only */x time sigs with x < 4
        tpq = int(tpb * quarter_factor)

        tpb_array = compute_ticks_per_beat_arr(tpb, midi)
    else:
        tpb_array = None

    # tracks, time signature and everything...
    # the point is to call a separate `convert_time_tpb` that itself would call
    # `convert_time` for different sections of ticks/beat values instead of just `tpq`
    for track in midi.tracks:
        if tpb is None:
            # CONVERT_TIME method
            convert_time(track.notes, tpq)
        else:
            convert_time_tpb(track.notes, tpb_array)

convert_time_tpb would look like:

def convert_time_tpb(vec: np.ndarray, tpb: np.ndarray) -> np.ndarray:
    """

    :param vec: vector to resample/compress.
    :param tpb: array indicating the number of ticks per beat per time signature
        denominator section. The numbers of ticks per beat depend on the time
        signatures of the MIDI being parsed. The array has a shape ``(N,2)``, for ``N``
        changes of ticks per beat, and the second dimension representing the end tick
        of each section and the number of ticks per beat respectively. For example:
        [[1000, 16],[3000, 8]] would mean that from tick 0 to 999 the ticks/beat value
        is 16, and from 1000 to 2999 the ticks/beat value is 8.
    """
    vec_sections = []
    vec_idx_start = 0
    # loop over the sections of ticks/beat values
    for tpb_end_tick, tpb_val in tpb:
        vec_sections.append(convert_time(vec[vec_idx_start:tpb_end_tick], tpb_val))
        vec_idx_start = tpb_end_tick
    return np.concatenate(vec_sections)

And compute_ticks_per_beat_arr, the method that computes the array of ticks/beat values, plus downstream utils methods:

def compute_ticks_per_beat_arr(tpb: int, midi: Score) -> np.ndarray:
    ticks_per_beat = [
        [
            midi.time_signatures[tsi + 1].time,
            compute_ticks_per_beat(midi.time_signatures[tsi].denominator, tpb),
        ]
        for tsi in range(len(midi.time_signatures) - 1)
    ]

    # Add the last one up to the max tick of the MIDI
    ticks_per_beat.append(
        [
            get_midi_max_tick(midi) + 1,
            compute_ticks_per_beat(midi.time_signatures[-1].denominator, tpb),
        ]
    )

    # Remove equal successive ones
    for i in range(len(ticks_per_beat) - 1, 0, -1):
        if ticks_per_beat[i][1] == ticks_per_beat[i - 1][1]:
            ticks_per_beat[i - 1][0] = ticks_per_beat[i][0]
            del ticks_per_beat[i]

    return np.array(ticks_per_beat)

def compute_ticks_per_beat(time_sig_denominator: int, time_division: int) -> int:
    r"""Computes the number of ticks that constitute a beat at a given time signature
    depending on the time division of a MIDI.

    :param time_sig_denominator: time signature denominator.
    :param time_division: MIDI time division in ticks/quarter.
    :return: number of ticks per beat at the given time signature.
    """
    if time_sig_denominator == 4:
        return time_division
    # factor to multiply the time_division depending on the denominator
    # if we have a */8 time sig, one beat is an eighth note so one beat is
    # `time_division * 0.5` ticks.
    time_div_factor = 4 / time_sig_denominator
    return int(time_division * time_div_factor)

def get_midi_max_tick(midi: Score) -> int:
    max_tick = 0

    # Parse track events
    if len(midi.tracks) > 0:
        event_type_attr = (
            ("notes", "end"),
            ("pedals", "end"),
            ("controls", "time"),
            ("pitch_bends", "time"),
        )
        for track in midi.tracks:
            for event_type, time_attr in event_type_attr:
                if len(getattr(track, event_type)) > 0:
                    max_tick = max(
                        max_tick,
                        max(
                            [
                                getattr(event, time_attr)
                                for event in getattr(track, event_type)
                            ]
                        ),
                    )

    # Parse global MIDI events
    for event_type in (
        "tempos",
        "time_signatures",
        "key_signatures",
        "lyrics",
    ):
        if len(getattr(midi, event_type)) > 0:
            max_tick = max(
                max_tick,
                max(event.time for event in getattr(midi, event_type)),
            )

    return max_tick

Btw a method returning the maximum tick of a Score such as get_midi_max_tick could be implemented in Score.

What do you think about this approach?

Yikai-Liao commented 6 months ago
lzqlzzq commented 6 months ago

tpb should be a List[int] because of maybe variable Tempo. Sure that will make the sematics for Tick ambiguous.

Natooz commented 5 months ago

tpb should be a List[int] because of maybe variable Tempo

If we were to resample a ScoreSecond yes, but that's not necessary in ticks as the tempo is only used when playing, it doesn't impact the number of ticks/quarter or tick/beat.

Yikai-Liao commented 5 months ago

@Natooz I have opened a new issue about SoA interface. With this, you could even write your own implementation using numpy (even numba) in miditok without lossing much efficiency. So lets talk about its interface design first #18, and it will come in few days once the interface is determined.