Find a way to select the most appropriate hitsound for a specific note

PladsElsker commented 2 years ago

That's gonna be pretty cool.

We basically need a sound comparator.

double disimilarity = SoundComparator.compare(sound1, sound 2);

To build such a function, I have an idea:

Given s1 and s2, some audio samples of the same length. We can apply a fourier transform on both audio separately (gives F1 and F2).

We are looking for a way to compute "disimilarity". Therefore, we want a difference. More precisely, we want to know the total difference given at any point on both functions.

The formula | F1(f) - F2(f) | represents the error function £(f) at a given frequency f.

Therefore, to know the total error on all the function domains, we need to take the integral of £(f) df.

The problem now is that not every function is going to have the same "mean volume". What I mean by this is that F1 could have a constant component that is WAY higher than the constant component of F2.

This is generally due to the volume of each sound being not normalized.

To normalize the volume, we can simply add constants to our F(f) functions. This gives the following equation for the new normalized error: £n(f) = | F(1) + k1 - (F(2) + k2) |

k1 and k2 can be computed by calculating the definite average integral of both F1(f) and F2(f) respectively.

So: k1 = (integral of F1(f) df between f1 and f2) / (f2 - f1), where f1 and f2 are just the lowest and highest frequencies in our discrete fourier transform.

Same thing for k2.

The other problem is that comparing different sound lengths will give different scalling of the result (the longer the sounds compared, the bigger the error value can become, even though both comparision should yield the same result).

To acomodate for that, we can divide the "definite integral of £n(f) df over f1, f2" by f2-f1 to get a consistent error value regardless of sound length.

And therefore: Comp(s1, s2) = (definite integral of £n(f) over f1, f2) / (f2 - f1)

And now, we know how to compare sound.

We can then compute the disimilarity value for every sound, and select the one with the value closest to the disimilarity value we are looking for (0 means no error, or exact same function, and the higher the disimilarity, the less "alike" the sounds are).

PladsElsker commented 2 years ago

Small problem with this implementation: it's comparing the exact same frequencies between the 2 wave functions.

It is a problem because now, if we were to compare 2 sine waves, and the frequencies don't match perfectly, it would think the functions are not alike at all.

If we were to compare, let's say, 200 hz sine wave with 400 hz sine wave, it would think both are as similar as 200 hz sine wave and 140 hz sine wave, but to my knowledge, the 140 hz one is more similar.

One way to fix that would be to sum each possible average of a window in the frequency data instead of summing individual values.

So like, if we have 5 samples, then we can average the elements 1-2 from the 1st sine, add that to the average of 1-2 in the 2nd, and store that in "result".

Then, average 2-3 in 1st, add to 2-3 in 2nd, add that in result, etc.

The width of the window could be a parameter passed to the function.

This solves the issue because each "window mean" considers multiple frequencies at a time.

PladsElsker commented 2 years ago

Actually, it's not really a "problem" per say, because the frequency comparator still works.

Like, it's not saying that 200 hz sine wave is closer to 400 hz than 120 hz is to 200 hz, it's just really picky about the frequencies compared.

The width of the mean window could therefore be interpreted as the "pickyness" of the algorithm.

Like, how much picky about the frequencies should the algorithm be?

PladsElsker commented 2 years ago

should we select the interval of the frequency window in the "period" domain?

Fft yields linear frequency spectrum.

If we were to select a time window with an interval around a specific frequency, for example, 100 hz +- 50 hz, then it would consider as much 50 hz or 150 hz @100 hz.

Problem with this is that the "pitch" of notes described as frequencies is an exponential function. 150 hz is less "high pitch" than 50 hz is "low pitch" if we both compare to 100 hz.

Therefore, we need to carefully select the upper bound and lower bound of our interval.

PladsElsker commented 2 years ago

another really cool improvement would be to multiply some sort of bell curve with its peak at index 0 for both function, add the absolute value of the difference of both "bell curved functions" in an accumulator, and then move the bell curve's peak to index 1 for both functions, and add again the abs of the dif for each element into the acumulator.

Then, the "pickyness" parameter would simply be used to describe the "wideness" of the bell curve.

PladsElsker commented 2 years ago

now, for performance, we could simply multiply by a cosine function, or by some sort of triangle function instead of a full fledged distribution function that's intense to compute.

PladsElsker commented 2 years ago

We don't need to subtract F1 and F2 by the means. If we calculated an fft, then the mean is actually present at index 0 in the fft, and then it's gone...

So yeah, the initial equation should just be: £(f) = | F1(f) - F2(f) | SC.compare(s1, s2) = (definite integral of £(f) between 0 and fmax) / fmax

where, in our discrete case, fmax is the length of the fft'ed data, and "the integral of..." is just a sum of all the elements.

Ozu-Beatmap-Toolset / ozu-cli

Find a way to select the most appropriate hitsound for a specific note #13