cannam / expressive-means

Expressive Means Vamp Plugins
GNU General Public License v2.0
6 stars 1 forks source link

Pitch Vibrato: debug options? #16

Closed FrithjofVollmer closed 1 year ago

FrithjofVollmer commented 1 year ago

Hi Chris,

found a short time window + sufficient Wifi in a German train to have a first look into the vibrato summary output. The layout looks great so far, thanks a lot! Regarding the analysis itself: do you think, there may be some more options again for debugging (or would it take too much time)?

Take the first few bars of the Huberman example for instance: Based on manual measurement, they signify as 4Fn> / N / 4Fm> / N / N ... – instead it currently returns:

Screenshot 2023-04-01 at 15 57 34

...so at first glance it seems to me that the detectors for

1) Duration (start / end of the vibrato) and Rate – that is, the number of peaks recognised – are too insensitive so far (for these aspects, the "peak" debug helps a lot already, thanks!), 2) Range seem to me way too sensitive (almost all Huberman vibratos are analysed as being "wide" [w] instead of "narrow"], 3) Development almost always return "stable" [=] instead of "decreasing" [>] (this changes only with very large vibratos, as in the Roehn example)

Anyways, will have a deeper look into this on Tuesday! Thanks, so far!

cannam commented 1 year ago

Did you change some parameters from the defaults for the example shown? When I run the extractor on Huberman it finds no vibratos in any of the first few notes at all! I'm obviously missing something.

FrithjofVollmer commented 1 year ago

Oh, yes – I switched the “Correlation threshold” parameter to 0.5 for exactly this reason, sorry! Changed that in the defaults.

Anyways, please consider this – I tested the pYin data of the Huberman example directly with Tilo’s app (https://vibratoanalyse.shinyapps.io/aktuell/); the upper example uses original pYin data, the lower one our own pitch track output:

Screenshot 2023-04-04 at 20 57 20 Screenshot 2023-04-04 at 20 57 46

…it found the vibratos even with a determination coefficient parameter setting of 75% (which is the lowest possible preset there, unfortunately). Plus, the numbers there (5.8 Hz / 36 cents) would translate correctly to the types determined by my manual analysis (Slow, narrow vibratos instead of fast, wide). So maybe there is indeed a problem in the translation of Tilo’s logic here? Just a guess, of course! Will look into that debug log file next week.

cannam commented 1 year ago

I've reviewed the vibrato code and added some tests.

First, there was indeed a stupid mistake to do with the width coding - narrow vibratos were returning the same "w" code as wide ones. (At PitchVibrato.h line 153) - this is fixed now.

I noticed that where Tilo's paper refers to a 35ms mean filter window for smoothing the pitch track, the R code uses a 70ms window - I guess his terminology is 35ms "either side" of each sample. So I have changed that and my code now uses a 70ms window as well. I'm not totally sure what effect that will have.

(My implementation is generally drawn from his paper, not from his code - I checked the code in cases where I wasn't sure what the paper meant, but it's totally possible that there are still cases where I was sure what the paper meant, but I got it wrong! However my goal was very much to implement what the paper said in a way that suited the plugin, not necessarily to match the R code's output exactly.)

I fully expect that the code still won't produce the results we hope, and so I've added tests with a lot of clarifying debug output. These are run by the program built as build/tests - this was already in the repo (with articulation and glide tests) but now is also available in the Github Actions artifacts output as an artifact called "tests", in case that's how you're getting your updates. It's a command-line program to be run from a terminal window, nothing to do with SV.

It has two tests for PitchVibrato, both using actual pitch track data from Huberman. They are named for the time within the Huberman clip at which the relevant note begins - 0.812 sec and 11.4 sec. See test/TestPitchVibrato.cpp lines 87 and 120 for the respective definitions in terms of pitch data and onset times.

The tests simply run the extractor with the default parameters and compare its output with a hardcoded vibrato code ("4Fm>" for the first and "3Mm>" for the second - please do correct these if they are not what you would expect). But at the same time they print out a lot of information about what they're doing, cross-referenced with the paragraph numbers in Tilo's paper. So we may be able to see whether there are problems caused by (for example) rejecting peaks early on because they don't meet some range or rate criterion, or whether they have problems with the correlation calculation etc.

FrithjofVollmer commented 1 year ago

Dear Chris, back from Easter break (hope you are well!) and starting with a very basic question: how do I run that test program from a Mac Terminal window? Since if I try to start it as a program, it denies the access (see screenshot). Do I need to install a cpp compiler first or something...?

Screenshot 2023-04-13 at 20 12 53
FrithjofVollmer commented 1 year ago

...just a short update on the modifications made so far (only the plugin itself as described in https://github.com/cannam/expressive-means/issues/16#issuecomment-1499348122 but not using the test script yet): yes, the results actually are even a bit more deviating from those of the manual analysis now... So I'll probably have to have a look into the debug log as soon as knowing how to start it :) – sorry for this probably stupid request. Maybe you could give me a hint where to inform myself about starting the kind of the "tests" script on a Mac? (E.g, a website or YouTube tutorial – as mentioned, no coding experience at all on my side, very unfortunately...)

Thank you, Chris!

FrithjofVollmer commented 1 year ago

By the way, in case you wondered – 

(My implementation is generally drawn from his paper, not from his code - I checked the code in cases where I wasn't sure what the paper meant, but it's totally possible that there are still cases where I was sure what the paper meant, but I got it wrong! However my goal was very much to implement what the paper said in a way that suited the plugin, not necessarily to match the R code's output exactly.)

– Tilo gave us permission to copy every part of his logic and code we need! He actually was pretty excited that you are going to look over this :)

Screenshot 2023-04-14 at 20 45 24

...just in case it would make thinks partially easier to directly translate from his code. We should mention him as co-author of the vibrato plugin anyways, shouldn't we?

cannam commented 1 year ago

how do I run that test program from a Mac Terminal window? Since if I try to start it as a program, it denies the access (see screenshot). Do I need to install a cpp compiler first or something...?

I was going to write a quick reply answering this - you need to run chmod +x on the file first to add executable permission. So type chmod +x tests, but giving the proper path for tests depending on where you have saved it.

And then when you run it you might just get a complaint that the file is untrusted, so you would then have to ctrl-click on it in Finder, choose "Open", then select the option to run it in spite of the warnings, in order to let MacOS know that it can trust it.

But after I had prepared this reply, I tried it myself on a Mac and it didn't work. The file produced by the Github action requires the testing framework library to be already installed on your machine at a specific location, and it won't be (it isn't on mine either).

So you would just see an error like

dyld[28786]: Library not loaded: /usr/local/opt/boost/lib/libboost_unit_test_framework-mt.dylib
  Referenced from: <4AA5460E-CFD2-35EC-AF25-6E0A9C05C9BA> /Users/particularprograms/Downloads/tests
  Reason: tried: '/usr/local/opt/boost/lib/libboost_unit_test_framework-mt.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/usr/local/opt/boost/lib/libboost_unit_test_framework-mt.dylib' (no such file), '/usr/local/opt/boost/lib/libboost_unit_test_framework-mt.dylib' (no such file), '/usr/local/lib/libboost_unit_test_framework-mt.dylib' (no such file), '/usr/lib/libboost_unit_test_framework-mt.dylib' (no such file, not in dyld cache)
Abort trap: 6

It's not easily circumvented on the user's side. I need to figure out how to get the tests built so that they don't have this external dependency. So unfortunately the answer is probably just to wait until I have found time to do that (which might be on Monday).

cannam commented 1 year ago

OK, I've modified the build and the artifact produced in the action should run now.

It is still a bit fiddly to run though - as I mentioned above, you will probably have to both set the file permissions to make the file executable:

$ chmod +x Downloads/tests

(I feel there ought to be a way to set the file as executable from Finder, but I couldn't find one)

and then also tell the system to trust the file (by ctrl-click in Finder + "open" + say to open anyway when prompted).

When you do that, Finder will open a new terminal window to run the program in and I think by default will leave it open after it's finished so you can see the output. Alternatively you can run it in your original terminal window as before

$ ./Downloads/tests

(or equivalent location).

FrithjofVollmer commented 1 year ago

Dear Chris,

yes, that worked – thank you! So far, I couldn't find anything that looked somehow faulty to me in the outputs (but it may need me some practice). So I proceeded with another test strategy and might have found something:

First, I used the Szeryng example instead of the Huberman, since the former has a greater variety of vibrato types and instances. I then rechecked the pitch track data of the "Expressive Means: Pitch Vibrato: Pitch Track" data against the results of my manual analyses. Here is the result based on Szeryng's first note (starting at 0.377):

Screenshot 2023-04-17 at 15 17 48

...it looks to me as if the smoothing function of our pitch track is the (or at least one) problem here again! It smoothens the data so much that about half the range is getting "lost". This of course affects the results from the analyses of vibrato duration (1,2,3,4) and development (<, >, : and =) as well. Checking the output of our pitch data against pYin's (blue) seems to confirm this suggestion, even though pYin seems to cause some range reduction as well. We might need some compensation for that.

Since to me, additional smoothing doesn't seem to be crucial for this plugin (Tilo's logic is doing some equivalents anyways, e.g., by using the correlation threshold), we may try the following:

  1. Feeding the plugin with a 'de-smoothed' pitch track data, equivalent to the pYin logic (your solution for Portamento discussed in https://github.com/cannam/expressive-means/issues/10#issuecomment-1482845385 will work well, checked it against the data above)
  2. Adding x/7 Cents to each vibrato period, where x is the respective period's range (or 2*amplitude) calculated by the plugin. This leads to some sort of 'adjusted' range which is used for the further analysis and classification processes. Using the Szeryng example above, the range of the third period calculated on base of the pYin data would come to: 57 Cents + (57/7=8) = 65 Cents. We might call this step "smoothing error compensation" or something like that within the code?

What do you say? Besides that, the other detectors (duration, rate) actually work pretty well (or well enough) so far, so thank you, Chris!

Also, I've got a related SV-specific question: For manual analyses, I always used SV's "Frequencies" bin display option (as in the screenshot above) but never actually understood how these single lines in the spectrogram are actually determined. SV's documentation (https://www.sonicvisualiser.org/doc/reference/4.5/en/#spectrogram) is rather general on this:

If set to Frequencies, each peak's bin will be drawn with a single line at a position that corresponds to an estimate of the actual frequency present within the bin, rather than the theoretical frequency range of the bin. This instantaneous frequency estimate is arrived at by comparing the theoretical phase difference between consecutive time frames at the nominal bin frequency with the measured phase difference.

...is it somehow possible that the math behind this uses some sort of prediction model which may cause additional smoothing of vibrato ranges, since curve turning points are hard to predict by comparing a limited number of consecutive frames? So, for instance: If three consecutive Hann windows indicate frequency ranges of

  1. 85–87 Hz
  2. 86–88 Hz
  3. 85–87 Hz

...(that is, a hypothetical turning point), the "Frequencies" display's math would tend to determine "86 Hz" for the second frame instead of "87 Hz", right? This actually is an assumption I had when writing the book chapter, adding a guess (see p. 219):

Furthermore, SV’s “Frequency” bin option may slightly even out maxima and minima, so an additional tolerance of around 10 cents should be considered.

Unfortunately, I hadn't known you to ask whether this could be true or not... Anyways, if that's true, we may change the additional rule mentioned above ("Adding x/7 Cents to each vibrato period, where x is the respective period's range") to "Adding (x/7)*2 Cents to each vibrato period**" – I guess this might be more yielding compared to a rather sweeping "+ 10 Cents" rule.

cannam commented 1 year ago

I mentioned smoothing a few comments further up, saying I had adjusted the window length to match Tilo's R code. The problem with removing the smoothing or introducing custom logic to adapt for it is that Tilo's code appears to perform exactly the same smoothing without any corresponding adaptation. So before unilaterally modifying our code, I'd like to understand why it isn't producing similar results to Tilo's.

Looking at the problem again now after a break, an obvious explanation might be that the peak identification process and time-correction interpolation are performed using the smoothed curve, but then the original unsmoothed curve is used for measurement and correlation (steps 7-10 in his paper). If the paper says this, then I missed it, but it would seem totally reasonable.

Looking at the R code, I am wondering whether the critical thing is the difference between arg.y and y. It looks as if y is the result of smoothing arg.y and is used in the following peak-finding step, but then most of the code below that refers to arg.y again. I had missed this distinction at first glance, not being particularly comfortable with R. So it looks as if that might be the cause?

FrithjofVollmer commented 1 year ago

Might well be (although I’m not as familiar with R too) – let’s see what happens! However, it’s important to add that I developed the vibrato classification without using Tilo’s app, so it might well be that the results keep deviating from my manual analysis somewhat.

(We might as well simply ask him if that helps: tilo.haehnel@posteo.de)

cannam commented 1 year ago

OK, I've reviewed the R code and updated the implementation accordingly. As far as I can see, the steps use the following pitch curves:

So that's what the code in PitchVibrato now does too. Note that the "Vibrato-only pitch track" output has been updated so that it returns unsmoothed values as well. (The "pitch track" output still returns smoothed values, for the moment.)

cannam commented 1 year ago

It looks to me as if this is still missing a number of vibrato elements in Szeryng. I'm going to set up a test or two using some of the note pitch data from this recording. I am thinking perhaps two tests:

FrithjofVollmer commented 1 year ago

Sounds good!

cannam commented 1 year ago

Also, I've got a related SV-specific question: For manual analyses, I always used SV's "Frequencies" bin display option (as in the screenshot above) but never actually understood how these single lines in the spectrogram are actually determined. [...]

Frequency measurement is a complicated business...

SV looks at the phase information for the bin and its successor in time, in order to see how far the wave for that partial appears to have actually advanced by the time the next bin centre is reached. This phase advance is compared with the nominal bin frequency to establish an actual instantaneous frequency. (It's the same method as used in a phase vocoder, if you want to look it up.)

This result is (in the right conditions, see below) exactly correct if the partial at that moment is stable, i.e. a fixed frequency.

If the partial is continuous but mildly time-varying (as here) then the result is still correct in the sense that it will be an accurate measure of the frequency at some instant between this bin centre and the next.

However, both of those statements are subject to the usual limitations about time/frequency resolution in analysis.

A long analysis window (e.g. 8192 samples) gives better frequency resolution and better results for non time-varying partials, but it is less able to discriminate frequency changes within the duration of the bin, and also because this naive analysis doesn't track partials as they switch from one bin to another, a partial can end up being analysed in multiple frequency bins at once, ruining the measurement for all of them.

A short analysis window (e.g. 1024 samples) avoids this problem but may be too short for lower frequencies to be resolved correctly at all.

So basically yes - everything is a tradeoff, and in general it is fair to expect that extremes might be smoothed out a little.

If you're interested in experimenting and doing some calculations, you could try locating the vibrato peak and measuring frequency in the waveform by looking at the length of a single (complex, with overtones) cycle. Quite fiddly to do but can be an interesting way to check the results.

(Of course, never forget that fundamental frequency is not the same thing as auditory pitch!)

cannam commented 1 year ago

I found a bug that was causing the results to differ from Tilo's app - it was drawing the min/max range for normalisation when calculating the correlation from the unsmoothed curve even though the points for correlation were coming from the smoothed curve. So many of the correlations were unreasonably low.

With this fixed, it matches quite well to the app, at least for the Szeryng example I was looking into. Now we just need to make it match your own calculations as well!

I noticed that one common issue is that peak selection is influenced by pitch values from a neighbouring note, resulting in both false positive and false negative errors. For example the smoothed peak location toward the end of a note may be affected by distant pitch values creeping into the filter from the following note, or a glide may be wrongly identified as a vibrato element.

This can happen easily because we have been performing vibrato extraction on the entire pitch track before associating each element with an IOI. (I think Tilo's app works the same way.) So instead I've experimentally added an option (a toggle, currently on by default) to segment the pitch track into note durations based on our onset-offset calculator and perform vibrato extraction on each note individually before putting them back together again. See what you think - my impression is that this produces more stable results but my impression may be selective.

There are still issues at the start and end of a note where elements often cannot be matched because there is not a full two-cycle duration to match with, and because of this some elements that are found without this option are missed when it is enabled - but I would argue that these only ever worked by accident, and that a proper solution for note start/end elements would look somewhat different.

Anyway, try it with and without this option, compare, and see what you think.

I've also experimentally added an option to control the window length for the smoothed pitch track that is now used for peak selection only - my impression is that Tilo's default of 70ms is pretty good, especially with segmented extraction enabled, but again see what you think.

Note that most of the automated tests are still failing, I'll review them again once we have feedback on the above.

cannam commented 1 year ago

Hm, the segmented extraction worked quite nicely for Szeryng but in the others it's worse. Anyway, appreciate feedback on these changes generally.

cannam commented 1 year ago

A related, somewhat audacious idea occurs to me:

I don't have time to try that right now, perhaps tomorrow if it seems worth a go?

However - when I looked again at the Huberman etc examples earlier, I wondered if perhaps we were still not matching the results from Tilo's app. The Szeryng example I was referring to did match his implementation, but if others still don't, that would be the first thing to look into before trying anything else new.

FrithjofVollmer commented 1 year ago

Dear Chris, sorry, I had to teach all day (and I will have to tomorrow as well), so will look into the changes on Thursday! Thanks for the work today, anyways, looks great so far.

FrithjofVollmer commented 1 year ago

Frequency measurement is a complicated business...

Thanks for the explanation, Chris!

[Segmented:] So instead I've experimentally added an option (a toggle, currently on by default) to segment the pitch track into note durations [...] [Flattened:] Within each IOI, calculate the median pitch value for that IOI and subtract it from all the pitches within that IOI

Cool, thanks! I've tried every option with a number of recordings now and it occurs to me that the "flattened" setting is producing somewhat yielding results! However, the overall performance – independent of which option is chosen – still seems to be somewhat arbitrary, especially when it comes to the detected range: sometimes, very narrow vibratos of <40 Cents are indicated as being "wide" [> 60 Cents]; in other instances it's the other way around; whereas the seem to be identified correctly in Tilo's app. So yes, I would agree, maybe it's worth to check for other discrepancies first before proceeding...

The correlation threshold works great now though!

cannam commented 1 year ago

Be sure to check the "Vibrato-only pitch track" output as well as the vibrato summaries. It's possible that the extractor is classifying a note transition or glide as part of a vibrato. This could have the effect of making it seem to work better, in terms of identifying vibrato lengths (if a vibrato continues to one end or the other of the note and this would not otherwise be detected), while at the same time giving it completely the wrong idea about vibrato range.

cannam commented 1 year ago

I have added yet another option for how to handle note boundaries, and I actually think perhaps this one ought to be the default: "Without Glides".

It runs the glide detector (from the Portamento plugin) and erases any discovered glides from the pitch track, then runs vibrato detection across the whole track as in the original method.

This has a very reasonable feel to me, so long as the glide detector is robust enough not to be falsely detecting parts of a vibrato! In my view the other two segmentation methods (segmented and flattened) have problems in principle that make them less correct even if the eventual results for specific notes might sometimes appear better.

Compared to the segmented mode, this mode has the advantage that it doesn't drop vibrato elements in repeated notes of the same pitch. Compared to the other modes it is a bit more resistant to glides.

I have also gone through the current set of four tests (in TestPitchVibrato I mean) and compared them against the output from Tilo's app and the visible pitch track. Their results all appear "correct" by those measures, but in the two cases where you provided a hand annotation, they don't match it. The comments I added to the test code explain why: 1, 2

FrithjofVollmer commented 1 year ago

Dear Chris,

very cool! Thanks for the comments on the tests as well. I think, the main reason for the inconsistencies is that my manual measurement method back then was slightly different from how Tilo's logic is actually working. So let's just leave it as it is now! However, a few questions or (rather) tuning suggestions:

1) Consider this one (Szeryng):

Screenshot 2023-04-27 at 16 37 34

...looks to me as if the glide detector is failing here, but couldn't we simply tell the vibrato plug in not to consider onset-crossing "vibrato-only" pitch data (that is, to consider the data isolated per IOI)? (This would also solve the problem of vibrato durations indicated as being >100% of the tone, as it happens if the threshold is further reduced – e.g., the IOI starting at 0.377 is indicated as being 105% vibrated then)

2) In this one (Szeryng at 8.051, vibrato-only pitch track vs. summary)...

Screenshot 2023-04-27 at 16 57 14

...two questions occur to me: 1) with a section duration threshold set at 150 ms, do you have any idea why it shows an offset vibrato (type 2) instead of a continuous vibrato (type 4)? (Is the vibrato-only pitch track connected accordingly?) – maybe it’s more yielding if we let the "duration" aspect ("vibrated part of the tone") to be determined by the start and end points of the vibrato-only data within each IOI instead of the vibrato element's time stamps...? 2) This is an example where the "development" detector somehow still doesn't seem to work accordingly: the vibrato here clearly decreases (from 66 to 48 Cents in manual analysis). Do you have any idea why that could be?

Finally, consider this one – green is the pitch track, red the vibrato-only filtered track. For the latter, I reduced the correlation threshold to 0.05 (!) in order to capture the vibrato (which is supposed to be wide instead of medium) from the beginning:

Screenshot 2023-04-27 at 17 41 52

...I don't get why the correlation threshold doesn't seem to have any effect on the vibrato-only pitch track. In general, it seems to me too insensitive – do I overlook something?

cannam commented 1 year ago
1. Consider this one (Szeryng):

...looks to me as if the glide detector is failing here, but couldn't we simply tell the vibrato plug in not to consider onset-crossing "vibrato-only" pitch data (that is, to consider the data isolated per IOI)?

This is essentially what the "Segmented" option for the Note Segmentation parameter does - it divides the pitch track up into IOI segments and runs the extractor on each individually. A problem with it is that sometimes a vibrato really does appear to span consecutive notes, e.g. at 6.379 (Szeryng).

The Segmented option doesn't remove glides (because I hadn't thought of that when I added it) and the Without Glides option doesn't segment into IOIs (because I thought removing the glides might be a sufficient alternative) but we certainly could do both. What do you think?

Another thing we could do to weed out some false positives would be to reduce the default value for the maximum permitted vibrato range. Or at least make this dependent on some instrument-class selection in the semantic version of the plugin. (Which we don't have yet)

2. In this one (Szeryng at 8.051, vibrato-only pitch track vs. summary)...

This one looks curious, I'll come back to it.

Finally, consider this one – green is the pitch track, red the vibrato-only filtered track. For the latter, I reduced the correlation threshold to 0.05 (!) in order to capture the vibrato (which is supposed to be wide instead of medium) from the beginning: ...I don't get why the correlation threshold doesn't seem to have any effect on the vibrato-only pitch track. In general, it seems to me too insensitive – do I overlook something?

In this case, the first vibrato element was being rejected before the correlation was ever calculated. That's because it didn't identify any peak at the start of it. As you know, the method from Tilo involves finding pairs of consecutive peaks that meet certain criteria and then checking the shape correlation for the curve surrounding them, so it has to find and accept the peaks first. And then the correlation is checked against a two-cycle region starting at the next local minimum before the first peak and ending at the next minimum after the second one.

Here the first peak should have been at the very first pitch value after the gap. But my code was refusing to select any peak without a value pitch value on both sides of it - perhaps because the next step (interpolation) requires two valid neighbours. I don't think that restriction was reasonable in fact so I've changed it, which makes this element a bit easier to find, although it still requires quite a low correlation threshold because part of the two-cycle region that is tested for correlation is missing (i.e. there is nothing to the left of that first maximum).

Incidentally there may be an argument for extending the regions returned in the Vibrato-Only Pitch Track output and the corresponding time regions used for the duration classification. Currently, both of these consider the "vibrato region" to go from one peak to the next, whenever there is an element spanning two peaks. But the area actually used for correlation calculation spans from the neighbouring minima, not the peaks, and so is a little longer. It would be a little more complicated to use those timings and I'm not sure what the effect would be, but I can see some case for it.

cannam commented 1 year ago
2. In this one (Szeryng at 8.051, vibrato-only pitch track vs. summary)...

...two questions occur to me:

1. with a section duration threshold set at 150 ms, do you have any idea why it shows an offset vibrato (type 2) instead of a continuous vibrato (type 4)? (Is the vibrato-only pitch track connected accordingly?) – maybe it’s more yielding if we let the "duration" aspect ("vibrated part of the tone") to be determined by the start and end points of the vibrato-only data within each IOI instead of the vibrato element's time stamps...?

Not entirely sure about this - I got different results depending on whether I used Segmented or Without Glides mode, and when I looked into it I found this was because the timings weren't being corrected properly after segmentation. I've fixed that, so perhaps check again.

2. This is an example where the "development" detector somehow still doesn't seem to work accordingly: the vibrato here clearly decreases (from 66 to 48 Cents in manual analysis). Do you have any idea why that could be?

Again this probably depends on the mode. With Segmented mode, the first peak would not have been found because of the lack of data to its left (same problem as in your third example) so the biggest range would have been missed. I've now fixed that and am seeing ">" for both Segmented and Without Glides here now.

cannam commented 1 year ago

The Segmented option doesn't remove glides (because I hadn't thought of that when I added it) and the Without Glides option doesn't segment into IOIs (because I thought removing the glides might be a sufficient alternative) but we certainly could do both. What do you think?

I've now added another option for this parameter - "Without Glides And Segmented" - and made it the default option, so be careful! I realise the situation is getting pretty confusing but do see what you think.

I don't think we should include all these options in the end product, and certainly not with such crazily long descriptive labels...

FrithjofVollmer commented 1 year ago

Dear Chris,

thanks for everything! It appears to me that the "without glides" option stays to be the best working, the "segmented" mode still seems to 'end' vibrato elements too early. So I went on with your second suggestion (confining the thresholds) which resulted in pretty reasonable findings. Going to manifest these new values as defaults in the code!

Still a few 'question marks' – please consider Beethoven-Szeryng again:

Screenshot 2023-05-03 at 10 34 30
  1. Apparently, the left vibrato instance (at 7.65) combines the vibrato-only data of the tone start and the tone end (since pitch difference to the data preceding 7.65 is >180 Cents), ignoring that there is a significant gap in between. It seems reasonable to me to exclude such instances, i.e., saying that significant data gaps (let's say, > 4 hops) between vibrato elements disqualify them to be considered as one (connected) vibrato 'string'. In such cases, the longest uninterrupted string 'wins'. Maybe we my solve a number of other problems by that as well (i.e. IOI-crossing vibratos, as at 9.665, or the case of 10.501).
  2. Consider the right vibrato (it's the one at 8.051 again, discussed in https://github.com/cannam/expressive-means/issues/16#issuecomment-1531504402 and above): (1) It somehow still shows "=", even though the "without glides" option was chosen... (2) Why do you think it shows "59%" vibrato duration only (but given a "4" within the type code for an uninterrupted vibrato, as the vibrato-only data rightly indicates)?
  3. To avoid vibrato durations that exceed the actual tone durations (as mentioned above) without segmented option, let's pragmatically confine the percentage to 100% max (so that the left vibrato would show 100 instead of 134%)!

Also, please consider this one (same recording, at 12.469):

Screenshot 2023-05-03 at 13 06 06

  1. I am very sorry, but it didn't occur to me until now that we actually do need the offset detector within vibrato, exactly for cases like this one! (Since of the tone ends before consecutive onset but it's thoroughly vibrated, we still want a "4").... So could be reinstall the two parameters related with offset and connect thema accordingly? (That is, vibrato duration is held against tone duration, not IOI!) Thanks, and sorry again...!
  2. Probably the same issue as above, just for awareness: Interestingly, the summary output in this case already shows a "4" even though there is a significant vibrato-only data gap until next onset (also, consider the "106%" duration indication).

I don't think we should include all these options in the end product, and certainly not with such crazily long descriptive labels...

Yes, I would suggest to prove the 'best working' option (without glides), the 'segmented', and the 'none' option only, what do you think? (It's for the "advanced" version of the plugin only, anyways.)

cannam commented 1 year ago

Re. 1 and 2, I checked the code and the way it associates vibrato elements with onsets is both probably not ideal and also not quite what I had remembered.

It simply takes, for any given onset, all those vibrato elements that start on or after that onset but before the following one. So the vibrato element (peak pair) that spans the 8.051 onset is actually associated only with the 7.65 onset, because it starts before 8.051, and not at all with the 8.051 onset. This is not what I would have said before I looked at the code! And I think it may explain both of these queries.

Will take another quick look tomorrow I hope.

cannam commented 1 year ago

Updated the code. It now picks the longest contiguous chain of vibrato elements within the note duration (i.e. the chain with the longest overlap with the note duration, not the chain that is longest overall).

This won't make any difference to the vibrato-only pitch track output, which shows all elements regardless of whether they are associated with an onset or not. But it will affect the classifications.

saying that significant data gaps (let's say, > 4 hops) between vibrato elements disqualify them to be considered as one (connected) vibrato 'string'

The precise specification (in terms of hops) is unnecessary - vibrato elements are from one chosen peak to the next and the smoothing and selection criteria mean that peaks will not be chosen very close to one another in the first place. So either they are precisely contiguous or there will be a significant gap, and we can just choose the contiguous ones.

FrithjofVollmer commented 1 year ago

Great, thanks! Dear Chris, I'm still struggling with the segmented / non-segmented problem, as the latter still produces a high number of false positives between changing pitches. I therefore checked and compared the "segmented" option's results in both vibrato-only and summary mode (Beethoven-Szeryng):

Screenshot 2023-05-04 at 19 47 19

...it looks to me like the problem lays in the translation to the summary output, not in the segmentation mode itself, as the vibrato-only track clearly indicates multiple connected elements. Could that be? If so, the "segmented and without glides" may actually be the best option (and could be restored).

(Btw, just for me to know: is the glide detection actualised automatically when we change something in the portamento plugin, i.e., the default values?)

cannam commented 1 year ago

Ah yes... I broke the segmented mode when adding the vibrato chaining logic so it was returning no vibrato for every note (except sometimes the first). I've fixed that.

Also restored Without Glides And Segmented as the default option.

No, the vibrato plugin actually always uses the default glide parameters at the moment. The only way to change that would be to add the glide parameters to the vibrato plugin's interface as well as all of the rest of its existing parameters. Do we want to do that?

I think the articulation plugin also uses the default parameters for its glide detection (for sonorous articulations) - it doesn't expose any glide parameters.

cannam commented 1 year ago

Oh I think I may have misunderstood the question for the last bit - are you asking whether a change to the portamento code (i.e. in Portamento.cpp) affects glide detection for the other plugins? The answer is no, they have to set up their own glide parameters. I haven't got a nice way to automate that one.

I've just updated both Articulation and PitchVibrato so that the full set of glide parameter defaults are listed at the top of both files, and I've checked that they match the defaults in Portamento. They are still not exposed to the user of the plugin, but you can tweak them in the source files if you like.

FrithjofVollmer commented 1 year ago

Yes, that's a lot better! Last suggestion (promise!): Same as in the glide link problematic, a number of false positives are still present in "segmented" (+ "segmented without glides") mode due to the somewhat lazy pitch track (see the one at 1.004, or 2.42, ...):

Screenshot 2023-05-05 at 22 20 45

Could we add to the "segmented" option the rule to only consider elements in between 30 ms after onset until 30 ms before following onset? (This value sounds reasonable to me since instrumentalists usually need some time to resettle vibrato when approaching a new pitch...)

Thanks, Chris! Besides that, I suggest to set different mode defaults for the various instruments (see following commits).

FrithjofVollmer commented 1 year ago

(see edit in the new rule suggestion above – thanks!)

cannam commented 1 year ago

Could we add to the "segmented" option the rule to only consider elements in between 30 ms after onset until 30 ms before following onset? (This value sounds reasonable to me since instrumentalists usually need some time to resettle vibrato when approaching a new pitch...)

I can see the need, but the proposal doesn't feel correct to me? Ruling out any ability to identify a vibrato peak at the start of the note doesn't feel quite right.

There are certainly some cases where a vibrato starts right at the beginning, e.g. the unison in e.g. Szeryng where the vibrato plays through from the previous note. (And as a pretty crap cellist, I know that I will occasionally try to "come in" playing a vibrato, generally because I realised while placing my finger that I am hitting the position a little flat!)

Generally speaking the idea of responding too much to what instrumentalists usually do, when the goal is to identify what they actually did, seems a little off. Though I suppose I may also be getting into the semantics of what constitutes a "vibrato element" vs a glide.

What would seem very logical would be to do something like this but only in the case where the onset is a pitch-type onset - not least because we already artificially bring forward the onset position for those, so we'd just be undoing that. I haven't checked whether this specific example is a pitch onset or not.

Does all of this amount to a reasonable concern?

FrithjofVollmer commented 1 year ago

Mmh yes, I see these points. Maybe my concern becomes a bit more intruding when considering Huberman instead of Szeryng: most of the vibrato elements identified in "segmented" mode here are false positives, and among them, almost all are due to that 'pitch overlap' problem. At the same time, a slight majority of that instances is indeed associated with pitch-change onsets (I checked it against all our violin recordings), but it applies to notes with spectral-rise onsets as well.

That is, we would risk to falsely attribute performances with vibratos in favour of getting actual vibrato elements right from the start if there is no compensation for the overlap problem (which in its core seems to be due to the natural reverb in recordings, I guess, as discussed in Articulation). If there is compensation, we might have a mode that avoids false vibratos but maybe at the expense of getting the full developments of factual vibratos (but they still may be identified as such if there is more than one cycle). From a historical perspective, I think the former option would be more problematic at the end, as the distinction of vibrated vs. non-vibrated seems to me more crucial than a distinction of, e.g., "continuous" vs. "increasing" vibratos.

Anyways, this surely is a quid-pro-quo decision to make here, and since we already have two great modes of catching vibrato elements in their full lengths ("none" and "without glides"), I'ld tend towards favouring a "segmented" option which indeed (and maybe a bit exaggeratingly) segments these elements from each other. What do you think?

Let me add that I literally counted hops all night to check when the performers in our samples started their vibratos on average :), and in most cases, they did not before five hops after onset (which rounds to 25 ms in our FFT defaults; so we could do 25 ms instead of 30 ms). I suggest this would apply to you and me trying a nice intonation-correcting vibrato at our instruments as well (as a crappy double bassist, I know too), since it most certainly takes more than 25 ms for us to perceive and then correct our first pitch-approaching attempts while fighting with large position changes... :)

cannam commented 1 year ago

Fair enough!

So to be clear, the idea as I understand it would be

it most certainly takes more than 25 ms for us to perceive and then correct our first pitch-approaching attempts

My premise was that you sometimes know before you've actually landed your finger that it's landing in the wrong place. But you're probably right even so!

FrithjofVollmer commented 1 year ago

In "Segmented" and "Without Glides And Segmented" modes (right? not only Segmented?), The first 25ms (edit: sorry, typed 30ms again at first) of pitch track after each onset would be effectively discarded and would not contribute to the analysis for any note.

Yes! Thanks!

My premise was that you sometimes know before you've actually landed your finger that it's landing in the wrong place.

Ah, sorry, yes – that’s a valid point, I guess, which would in fact rule out my argument! Some kind of ‚prophylactic vibrato‘ then :) let’s just hope that’s something professionals wouldn’t do too often…

cannam commented 1 year ago

This last change is in the repository now, and also in the 0.1.1 release.

FrithjofVollmer commented 1 year ago

This works just great. Thanks a lot. Issue (finally) solved!