Articulation: noisy onset detection

cannam / expressive-means

Expressive Means Vamp Plugins

GNU General Public License v2.0

6 stars 1 forks source link

Articulation: noisy onset detection #4

Closed FrithjofVollmer closed 1 year ago

FrithjofVollmer commented 1 year ago

Dear Chris, last but not least about the noisy onset detection: Thanks for implementing the logic here! You hard-coded the noise ratio parameters now, right? Would it still be conceivable to keep them as flexible parameters, at last for "affricative", "plosive", and "fricative"? Also, "affricatives" are not recognised at all right now, which seems to be due to my oversensitive preset suggestions.

Anyhow, the way bigger problem which occurs now is that as soon as an offset precedes, onsets are regularly classified as "noisy" (so even almost noiseless onsets are classified as being "plosive", see screenshot). So the spectral rise logic may not be the best solution anyways. I have to think about that again. If you don't have a better idea I would suggest to postpone this issue until next week (I'll have more time to reconsider then)... Ok?

Type and index layer work fine, of course! Thank you!

FrithjofVollmer commented 1 year ago

Dear Chris, is there a proven way to measure spectral density?

What we want is this: the subclass of Impulse / "noisiness" may be defined as proportion of non-periodic sound elements (or the ratio of frequencies besides fundamentals and overtones) within a certain time span. "Sonorous" (type: s) then shall correspond to almost none, "fricative" (f) to a little noise portion over a relatively long term (= cloud-like in frequency domain), "plosive" (p) to a high portion but over a short term (sudden, column-like), and "affricative" (a) denotes a long-term combination of both (sudden, dense cloud; see figure attached).

I suggest this may be assessed by the number of FFT windows simultaneously active during the respective time span. If there is no proven logic so far, I would like to suggest the following:

[Parameters:]

[o5] is the "noise time window" parameter, same as for the spectral rise detector; preset: 100ms
[L] is the new offset threshold parameter as suggested for offset detection, preset: -70dB
[p] = "Impulse noise ratio: Plosive", preset: 80%
[f] = "Impulse noise ratio: Fricative", preset: 30%
[ax] = "Index factor: Affricative", preset: 5
[px] = "Index factor: Plosive", preset: 3
[fx] = "Index factor: Fricative", preset: 2
[sx] = "Index factor: Sonorous", preset: 1

[Logic:] If within a range of 100Hz to 4kHz,

at least [p] FFT windows within [o5]/2 (half the time span) and at least [f] FFT windows within [o5] (full span) following an onset exceed [L] = allocate type code "a" and factor [ax]
at least [p] FFT windows within [o5]/2 (half span) following an onset exceed [L] = allocate type code "p" and factor [px]
at least [f] FFT windows within [o5] (full span) following an onset exceed [L] = allocate type code "f" and factor [fx]
the percentage of FFT windows within [o5] following an onset is below [f] = allocate type code "s" and factor [sx]

...what do you think?

FrithjofVollmer commented 1 year ago

Actually, if it reduces the effort for you, the Index factors [ax, px, …] may stay hard-coded as there are right now…!

It’s really just the percentage parameters [p] and [f] which should be flexible in order to determine the right settings per recording.

cannam commented 1 year ago

I think I have actually mis-coded the noisy onset classification, specifically the part that treats the first and second halves of the window. I'm looking over this now.

cannam commented 1 year ago

I've pushed an update that follows similar logic to before, but with more sensible handling of the basic arithmetic involved. That is, it is still the suggestion from email, based on looking at regions of the noisy onset detection function - it does not implement anything from this issue.

Take a look at that and see whether it has improved things, and can work well enough either as it stands or with some minor adjustments - or whether we still need to reconsider along the lines of this issue. The suggestion in your second comment here is interesting but a little daunting, so it would be nice to find that the simpler approach could work after all.

FrithjofVollmer commented 1 year ago

I'm afraid the problem is still apparent... I suggest it's a problem of the logic rather than the code: I didn't consider offset periods before onsets when I made that suggestion on using the transient onset detector to detect noisy impulses. That way, even relatively 'noiseless' onsets (= "sonorous", s) will be considered as being "affricative" (a, see screenshot) – it should have been obvious; I am sorry for not having that in mind back then...

Could we give my new suggestion a try? Or maybe you have a better idea? I guess aiming at spectral density might be most promising.

5 5 Noisy onsets

cannam commented 1 year ago

There are at least two sources of ambiguity in

at least [p] FFT windows within [o5]/2 (half the time span) [...] exceed [L]

Does "FFT windows" refer to time slices (this is what is usually meant by windows) or did you intend "FFT bins" (i.e. frequency components)? If the former, then [p] and [f] presumably refer to a proportion of time hops rather than a proportion of bins within each time hop. But it isn't clear in this case how the single value [L] can refer to a whole hop (i.e. all bins), so I am guessing that this is not what was intended, and that you meant FFT bins.
Assuming that this refers to FFT bins, the treatment of time windows / slices / hops is still relevant, because the duration [o5]/2 consists of several hops and there is more than one possible interpretation of what it means for "at least [p] FFT bins" within a whole sequence of hops to be above a threshold.

The way I started out interpreting the above, for the half span, was to look for at least [o5]/2 consecutive hops anywhere within the time range all of which have at least [p] bins exceeding [L]. That felt intuitively reasonable to me.

But that doesn't seem as if it will work at all for the full span - I ended up checking that every single hop within the full span has at least [f] bins above [L]. That is surely too strict, and will fail if an onset is detected a little early or if the noise window duration is not quite long enough.

Two other interpretations that come to mind are:

There is at least one hop in the first [o5]/2 after the onset such that at least [p] FFT bins within that hop exceed [L]
During the first [o5]/2 hops there are in total at least [p] distinct FFT bins that exceed [L] at some hop

Let me know what you'd prefer to try, or if you have another intuitive interpretation in mind that I haven't thought of.

For the sake of implementing something to look at, I've written and pushed the interpretation I mention above (at least [o5]/2 consecutive hops have ratios above [p]; all [o5] hops have ratios above [f]). You can find the logic in Articulation::classifyOnsetNoise.

For evaluation I made the "noise level" value in the Summary output show the relative duration of hops having ratio > p, i.e. the value that must exceed 50% to classify something as plosive or affricative. The fact that this never seems to get anywhere near 50% suggests that even my "intuitively reasonable" check for the half span was far too strict to be useful.

FrithjofVollmer commented 1 year ago

Dear Chris, your interpretation was as I envisaged it – and this actually seems to be working! The issue was my threshold suggestions: switched to [p]=20% and [f]=8%, the results, at least based on the Rose recording, do not look bad at all at first glance!

Will continuing to check this with the other recordings from Monday on, but just that you know...

FrithjofVollmer commented 1 year ago

Dear Chris, I was wondering whether the rule

If within a range of 100Hz to 4kHz,

is considered within the code – didn't find it in that part of the logic (Articulation::classifyOnsetNoise), and this would explain why the thresholds are needed to be so low in order to detect anything. If so, it may be worth to implement in order to have a greater flexibility in setting them accordingly (i. e., considering older recordings with a musical bandwidth of 5 to 12 kHz only and rumbling artefacts below 0.1 kHz)...

Also, I have to think about how to compensate for noisy recordings (low SNRs) in an elegant way (see screenshot 1 and 2, using the same settings – the former a cleaner, the latter a noisier one). Do you know of an existent logic that that detects and compensates for the technical noise parts?

Thinking of suggesting some sort of a "Technical Noise / Surface Noise Coefficient" otherwise which has to be set by the user on an ordinal scale, comprising the [p], [f], and [L] as a hard-coded ratio and, this way, reducing the number of parameters. For instance, the user would choose from a scale from 1 (almost noiseless) to 10 (extremely noisy) which sets the [L] sensitivity and applies the [p] to [f] ratio then... But let's wait for how the connected [L] parameter is working :)

cannam commented 1 year ago

The [L] parameter should be wired up now (as reported in the other issue).

The range 100-4000 Hz is used identically for all of the spectral rise/fall calculations - these values are set as parameter defaults in SpectralLevelRise.h (lines 45 and 46) and never changed. It's quite convenient to be using the same range for every spectral calculation - obviously it wouldn't be all that hard to use multiple ranges but this does bring some simplicity.

FrithjofVollmer commented 1 year ago

Dear Chris,

during the past days, I experimented with various recordings and SNRs to find out about this issue. I found that until a SNR of roughly 20 dB (that is, recordings with little to moderate background noise), there is an inversely proportional relation between SNR on the one hand, and [p] and [f] on the other (with [p] always being 2*[f], interestingly enough!). However, as soon as the SNR falls below 20 dB, the interval between [p] and [f] decreases in a somehow -ln(x) manner. Moreover, [p] and [f] both strongly depend on the degree of reverb, on whether or not the audio level is normalised (towards 0 dBFS), on whether or not the preceding IOI contains a lungo (L) note duration and a glide, as well as on the displayed pitch range (that is, the Szigeti Bach – played on a violin – needs higher [p] and [f] thresholds than the Rose Bach, played on a viola).

In fact, this is way more complex than I thought it would be. However, I think the following amendments should handle the problem – and I would like to excuse in advance; this may look a bit slaying at first glance…

I suggest the following:

Within the „Processing“ area of the plugin settings, we add a „Normalise audio“ tick box on top (above the „audio frames“ and „increment“ settings) which is activated by default. It causes a standard alignment of the audio’s maximum level towards 0 dBFS based on raw power, preceding analysis.  (—> This will help for a number of other issues as well, such as false-negative on- and offsets caused by too low levels.)
For the noise type output logic only, the floor parameter [L] is replaced by a hard-coded -70 dB. (However, [L] stays a parameter for the offset detection function!)
We define two new parameters [w1] and [w2] forming the „Onset / Offset / Noise: Spectral detection range", which provides the edges for SpectralLevelRise.h (lines 45 and 46, making range a parameter). Preset is 196–4,000 Hz [w1–w2].

We define a new bundle [q] named "Sound quality (degree of surface noise)" (for these bundles see the upcoming issue on redesigning the parameter settings window) which on an ordinal scale provides six exemplary pairs of raw values for the parameters [p] and [f] in the following order:

            [q wheel, allowing for six positions]     [basic value return for p and f]
[q1]         1 [= clean, SNR >60 dB]                   p=22%, f=11%
[q2]         2                                         p=26%, f=13%      **preset**   
[q3]         3                                         p=32%, f=16%
[q4]         4                                         p=34%, f=20%
[q5]         5                                         p=36%, f=27%
[q6]         6 [= extremely noisy, SNR <6dB]           p=53%, f=47%

We define a new parameter [r] named "Reverb duration factor", which returns factors for scaling the raw [p] and [f] values given in step 4. Preset is "1.5".

We define a new bundle [u] named "Reverb duration" which provides four exemplary [r] settings:

            [preset name]                            reverb factors for [f] & [p]   
[u1]         small studio (<150 ms)                   *1                
[u2]         large studio (c. 150–600 ms)             *1.5         **preset**               
[u3]         concert hall (c. 600–1,500 ms)           *2.25
[u4]         church (>1,500 ms)                       *3.375

Within the „Plugin parameters“ window (when prompting the plugin), we add a tick box next to each the „Impulse noise ratio: Plosive“ [p] and „Fricative“ [f] parameters (in between the string and the wheel would be a good place I guess). These boxes are deactivated by default.

Determine [p] and [f] parameter raw values:

1) Deactivated tick boxes (step 7) result in [p] and [f] determination by multiplying the values given by step 4 to 6:  
[basic values resulting from q] * [r factor resulting from u]

Examples: 
(1) [q] is set to „2“, [u] is set to „large studio“:
    [p] = [q2] * [u2] = 26% * 1.5 = 39%
    [f] =  [q2] * [u2] = 13% * 1.5 = 19.5%

(2) [q] is set to „4“, [u] is set to „concert hall“:
    [p] = [q4] * [u3] = 34% * 2.25 = 76.5% 
    [f] =  [q4] * [u3] = 20% * 2.25 = 45%

2) Activating one or both tick boxes will bypass the values given by steps 4 to 6 for the respective [p] or [f] parameters and keeps allowing for the user to choose customised values.

Examples: 
(1) [p] and [f] boxes ticked: Both parameter values are given by the user. 

(2) [p] box ticked, [f] box not ticked, [q] is set to „4“, [u] is set to „concert hall“:
    [p] = given by the user
    [f] = [q4] * [u3] = 20% * 2.25 = 45%

Refine [f] parameter values per IOI – compensate for note overlaps: We introduce a new parameter [v] named "Overlap compensation factor" which further scales [f] only. It is prompted by a "overlap compensation" tick box (activated by default) beneath the "reverb duration" parameter settings [r] (deactivating the tick box causes a bypass of this step): Beforehand noise ratio analysis, note duration analysis (that is: onset, IOI, and offset time) is run. If the preceding note is identified as being „lungo“ (L), the [f] value (either resulting from steps 4 & 5 or given customary) is further scaled on base of the following overlap compensation factor [v] preset (otherwise set *1):
```
[v] = 1.6 
```
Examples: (1) [p] and [f] are determined to be [p] = 39%, [f] = 19.5%. A "lungo" (L) note precedes the respective onset. --> final parameter values are [p] = 39%, [f] = 19.5% * 1.6 = 31.2% for this specific IOI.

(2) [p] and [f] are determined to be [p] = 39%, [f] = 19.5%. Four IOIs contain a 1) secco (S), 2) a lungo (L), 3) an elastico (E), and 4) a lungo (L) note. --> final parameters for IOIs nos. 2 to 4 are: [p] = 39%, [f] = 19.5% [p] = 39%, [f] = 19.5% * 1.6 = 31.2% [p] = 39%, [f] = 19.5%
Start noise ratio analysis.
Filter results for glide incidences:  We implement the glide detection function of the portamento (will look into it tomorrow!) and introduce a new rule: If a preceding IOI contains both a note duration of „lungo“ (L) and a glide, onset noise ratio is labelled „sonorous“ (s) regardless of actual analysis result. (—> This way, we omit false-„affricative“ onset labels due to the broadband reverb of the glide…)

…good grief, I admit this became an enormous request! However, I gave my last four days on checking this against numerous audios and parameter settings, so once it’s settled, it should work :)

I also admit that the total number of parameters starts to become a bit unhandy (especially for our future unexperienced users). In a new issue (tomorrow), I will propose a new idea for the „Parameter plugin“ menu layout which may make up for this!

Thanks a lot and all the best, Frithjof

FrithjofVollmer commented 1 year ago

Github seems to think my tabulators as being code :) so here step 4 to 6 as screenshot, just in case:

(Please note that the scheme slightly altered since yesterday (Feb 21st) – made a few edits regarding the overlap compensation [v] mainly! This one is the final version now – Thanks!)

cannam commented 1 year ago

An interesting investigation - but as you might imagine, I have some concerns about the way this is going!

First, there are several things identified here (and of course in the neighbouring issue #9) that are actually feature requests for the host (Sonic Visualiser) and the protocol (the Vamp API) rather than the plugin. Anything that involves bundling parameters or modifying the "Processing" section of the parameter dialog (which is entirely generated by the host) is an SV or API change rather than a plugin change. There is nothing in the Vamp API that gives the plugin any way to describe bundles of parameters or to provide presets in any way other than as a single preset that controls all parameters. The code for the parameter dialog in SV is not something that has recently been worked on and there is risk in any change there of quietly breaking the behaviour in other ways or for other plugins.

Second, of course this is generally a lot of additional work being proposed - and of course the noisy onset classification has already consumed a bit more work than was expected and really allowed for. There are practical limits on time and budget to consider, and this is only one feature among quite a number of plugin outputs.

Third, I am concerned that this is a bit like manually over-fitting a model with a large number of parameters. You have a number of test cases and have established that with certain parameters, you can probably match the "ground truth" for these onset types. But I am concerned that each time a new case comes up that the existing parameters turn out not to fit, the temptation is to introduce a new parameter. For the user, the result may be that they can get the plugin to identify all onsets correctly, but only by doing almost as much work tuning parameters and re-running the plugin as it would have taken to identify them manually. Or, it may be that the plugin doesn't work out-of-the-box but the user can't understand why not because the parameters are so complicated!

Can a trained human, looking at a spectrogram, identify the type of a noisy onset, without having to be told in advance how much reverb was present or whether the instrument was a violin or viola? If so, then it should be possible to do an acceptable job automatically in what is after all intended to be an "assistive" tool rather than an "authoritative" one.

(It may well be that the best answer would be to train a small neural network on various examples with annotation and data augmentation - this is essentially 4-way classification from noisy images with small but reliable training data, which is probably not difficult for a convolutional network. But I am not proposing that we do that now either - maybe something to consider later!)

My feeling is that it is probably better to use a simpler model that works "only some of the time" right now, than to introduce a lot more complexity at this moment. If it isn't ideal, it can be reviewed in a "version 2" later on. But if there is something fundamental that proves an obstacle to doing this simply and also affects other aspects of feature extraction - and perhaps background noise removal is an example - then it will be worth spending some time on that as a general pre-processing step at some point.

Happy to talk this over in a call at some point if you like!

FrithjofVollmer commented 1 year ago

Dear Chris,

yes, I see all your points... The thing is that the impulse noise detection feature actually is the most essential within the whole Articulation plugin, for it has by far the greatest impact on the Index. I had that feeling that it might become much more complex than I thought, but I'm afraid without good detectors here the whole plugin doesn't make a lot of sense... Anyhow, please give me another chance to explain my thoughts on this a bit better:

First, there are several things identified here (and of course in the neighbouring issue https://github.com/cannam/expressive-means/issues/9) that are actually feature requests for the host (Sonic Visualiser) and the protocol (the Vamp API) rather than the plugin. Anything that involves bundling parameters or modifying the "Processing" section of the parameter dialog (which is entirely generated by the host) is an SV or API change rather than a plugin change. There is nothing in the Vamp API that gives the plugin any way to describe bundles of parameters or to provide presets in any way other than as a single preset that controls all parameters. The code for the parameter dialog in SV is not something that has recently been worked on and there is risk in any change there of quietly breaking the behaviour in other ways or for other plugins.

Maybe my use of the word "parameter bundle" was a rather unfortunate: What I meant was that each "bundle" essentially is interpreted as one parameter on his own, each of its options feeding a number of hard-coded values to the outputs. If set to "Custom", these values may uniformly be coded as "0". Each output then uses an accumulation of values; that is, one part comes from the "bundle" parameter (one of the hard-coded values), the other from the single parameter. To make sense of this, I had that idea of "bypassing" tick boxes in #9 : is it possible to implement such boxes to the plug-in’s dialog window without changing SV and the API?

If yes, the idea would be that one addend in the equation always is a zero, settled by the "Custom..." option (for the "bundle") or the tick box (for the single parameter), respectively. For instance, if I choose "Moderate" note durations (100 ms) and at the same time the "Minimum onset interval" parameter says "120 ms" but it is deactivated, it comes to 100 + 0 = 100 ms. If I set the former to "Custom" and activate the latter by ticking it, the calculation would be 0 + 120 = 120 instead. (I understand the API wouldn't allow for automatically ticking the associated single parameters in order to activate them as soon as the "bundle" parameter is set to "Custom...", right?)

This idea is only for the case if changing the SV or API would be too risky. However, if there is a way to do so without possible harm for existing plugins' functionality, a small API alteration (e.g. within the next SV update) would probably be the even more elegant way... On additional resources necessary for this option, please see below!

Second, of course this is generally a lot of additional work being proposed - and of course the noisy onset classification has already consumed a bit more work than was expected and really allowed for. There are practical limits on time and budget to consider, and this is only one feature among quite a number of plugin outputs.

Yes, that's right and I totally see this point. Since this step and the one proposed in #9 is so important for the handiness of the plugin, I would like to propose a budget rise – aiming at these two steps specifically – which I am willed to compensate for privately (that is, added to the "Articulation" budget and sent immediately) since the additional work necessary for this clearly exceeds our original agreement. Can you imagine an addition of 500 Euro may make up for it (at least to some extent)?

Besides budget, please don't mind about the time scopes here in Stuttgart – as said, end of May or beginning of June would be totally fine (that is when our next network conference happens; it would be great to present the plugins there to potential users), so please take all the time you need for this!

Third, I am concerned that this is a bit like manually over-fitting a model with a large number of parameters. You have a number of test cases and have established that with certain parameters, you can probably match the "ground truth" for these onset types. But I am concerned that each time a new case comes up that the existing parameters turn out not to fit, the temptation is to introduce a new parameter.

Yes, that was what I was thinking when we started the work on these plugins with only a handful of recordings (all picturing violins in only two work excerpts, the Bach and the Beethoven)... Anyways, I was widen the scope significantly during the last week (experimenting with various instruments, works, recording technologies, noise ratios, reverbs, ... – even used examples from opera singing and drumset!). The logic as proposed above worked with almost all of them, so I am confident these suggestions are actually a step forward towards more generality instead of more constriction!

For the user, the result may be that they can get the plugin to identify all onsets correctly, but only by doing almost as much work tuning parameters and re-running the plugin as it would have taken to identify them manually. Or, it may be that the plugin doesn't work out-of-the-box but the user can't understand why not because the parameters are so complicated! Can a trained human, looking at a spectrogram, identify the type of a noisy onset, without having to be told in advance how much reverb was present or whether the instrument was a violin or viola? If so, then it should be possible to do an acceptable job automatically in what is after all intended to be an "assistive" tool rather than an "authoritative" one.

Yes, that's true! This is why #9 sounds so promising to me – in fact, most researchers (who at least in our network are deeply rooted in the humanities rather than in signal processing) would probably be overstrained with only five parameters (or even less) already as soon as only one requires them to choose between various dB or ms thresholds....

The "bundles" in #9 therefore propose descriptive settings rather than concrete values: Everyone can easily guess whether his audio represents a violin or a singer (plus its relative range), whether it's a rather clean or noisy recording that was recorded in a dry studio rather than in a church, or how "fast" or "slow" the notes are (on a relative basis, and by means of a metronome this one could even easily figured out) – what I'm trying to say is that I think this solution actually reduces the effort drastically that has to be put into the plugin before getting any results...

What do you think? Do you see any chance for making these ideas work...?

cannam commented 1 year ago

There is no way to have a parameter whose value toggles the availability of other parameters, I'm afraid - which I think is what you are describing (a tick box that, when ticked, makes a number of other parameters become inactive).

The Vamp parameter mechanism is fairly simple as you know, and one of its simplicities is that all of the parameters are always displayed and always active, and there is no connection between them in the UI other than the order in which they are displayed.

So it is certainly possible to have a tick box and to label it in some meaningful way, but unfortunately it's not possible to show the true effect of ticking it in the UI.

In other bad news, there is also really no such thing as a "small" alteration to the API itself - the requirement for backward compatibility of existing plugin binaries means that any change to the API has to be very carefully managed and tested, and even if the change itself was fundamentally trivial, it would be a quite significant piece of work to guarantee that it couldn't break anything. The API has been changed once before, from v1 to v2 in 2008, and all 12 subsequent releases of the Vamp plugin SDK have used exactly the same API.

Of course changing the API is not impossible, but the effort/reward balance is strongly in favour of working around limitations where possible.

If the goal is to offer "semantically meaningful simple interface" vs "all the dials complex interface", then a possibility is simply to have two plugins, with identical insides but different parameter interfaces. One plugin could have the simple/semantic parameter set that offers only dropdown-type options, and the other could expose all of the actual underlying parameters. The relationship between "semantic parameters" and "underlying parameters" could be made clear in accompanying documentation for anyone who was keen to understand the workings.

That would still be adding quite a bit of work, but it's all in the plugins then, and the "simple" version of the plugin would be even simpler and cleaner.

That might also give a hint about what to do with the debug outputs perhaps? (i.e. have them only in the advanced version of the plugin.)

Perhaps this is a point worth emphasising in general: it is quite easy to have lots and lots of related plugins in a library - there is really very little limitation there. That can be a useful way to work around other limitations in the API and it's always worth considering.

FrithjofVollmer commented 1 year ago

Dear Chris,

If the goal is to offer "semantically meaningful simple interface" vs "all the dials complex interface", then a possibility is simply to have two plugins, with identical insides but different parameter interfaces. One plugin could have the simple/semantic parameter set that offers only dropdown-type options, and the other could expose all of the actual underlying parameters. The relationship between "semantic parameters" and "underlying parameters" could be made clear in accompanying documentation for anyone who was keen to understand the workings.

...sounds like we have a solution! Thank you! This way, even the technical less keen user can make at least some use of the plugin, which is what we wanted. Also, yes – let's do this "semantic" version for the onset & the respective summary outputs only. Whoever wants to make use of the single aspect and debug outputs has into dig to the documentation anyways :) – I am going to propose an actualised parameter presetting overview in #9 ; please also see the suggestions for naming!

Besides that, what do you think about implementation of the new noise detection logic – do you see a chance for it, given the proposal above?

(In other good news: I got notified on Friday that all signatures necessary for the payment have been processed; so it should arrive at your account any time soon, hopefully...)

FrithjofVollmer commented 1 year ago

(Sorry, just realised that

let's do this "semantic" version for the onset & the respective summary outputs only

doesn't make sense since all of the outputs would be given for this version of the plugin anyways, right? Please ignore...)

cannam commented 1 year ago

I couldn't quite parse this:

let's do this "semantic" version for the onset & the respective summary outputs only

doesn't make sense since all of the outputs would be given for this version of the plugin anyways, right? Please ignore...)

The two plugins could have quite separate sets of outputs (although of course they would calculate the same features internally - they will probably be the same C++ class in fact) so it would be no problem making the semantic plugin have only the higher-level outputs. I hope this is consistent with what you meant!

Question about the logic itself: The range w1-w2 referred to - is this supposed to be a global setting, i.e. once the parameters are set and the plugin is processing, we then use the same frequency extents in every situation where the spectral rise/flux calculation occurs? (As is the case now, with the hardcoded range.) Obviously it is simpler if so.

FrithjofVollmer commented 1 year ago

The two plugins could have quite separate sets of outputs (although of course they would calculate the same features internally - they will probably be the same C++ class in fact) so it would be no problem making the semantic plugin have only the higher-level outputs. I hope this is consistent with what you meant!

Yes, it is! After our last discussion on this (i. e., your mail when explaining the architecture of VAMP plugins) I thought that every feature calculation ends up in an output :) – thanks for clarifying!

Question about the logic itself: The range w1-w2 referred to - is this supposed to be a global setting, i.e. once the parameters are set and the plugin is processing, we then use the same frequency extents in every situation where the spectral rise/flux calculation occurs? (As is the case now, with the hardcoded range.) Obviously it is simpler if so.

Yes, that would be my suggestion! I assume this improves the performance of our splendid spectral drop detector even more since focusing on the actual musical content a bit better (i. e. when it comes to low pitches of bass instruments with fewer overtone projection), would you agree?

cannam commented 1 year ago

A further question. Consider the following text

If a preceding IOI contains both a note duration of „lungo“ (L) and a glide, onset noise ratio is labelled „sonorous“ (s) regardless of actual analysis result

Is the idea here to identify a glide that leads into the current onset, i.e. that occurs at some point after the previous onset?

If we run the glide detector (as written for the portamento plugin) first, then it will have already identified all the glides it can find within the recording, with each glide tagged with its "nearest" onset. That onset might be before, during, or after the glide - all we know is that there is not another onset anywhere nearer.

So if we are looking for any glide that might suggest a smooth start to the current note, then presumably what we are looking for is a glide tagged with the current onset and starting before it. Does that seem right?

FrithjofVollmer commented 1 year ago

Yes, it does! Thanks, that makes it a lot easier of course...

cannam commented 1 year ago

OK, the glide-detector has been wired up and such notes should now be identified as sonorous.

Two caveats: (i) it isn't well tested! and (ii) we don't currently have any of the adjustable parameters from the portamento plugin hooked up in the articulation plugin, so it always uses the default glide thresholds.

That apart, I think we now have the whole of the onset noise type logic described above? I might well have overlooked something, let me know!

FrithjofVollmer commented 1 year ago

Yes, looks like everything is well working now – thank you a lot for the enormous extra work on this, Chris!

I think we may close this issue for now (and reopen if needed regarding the glide defaults; but for now, even this one is working pretty well...!).