Open AlexHarker opened 2 years ago
Here are two pics of a short file that has been analysed for mfccs and novelty.
The first is represents that current latency reporting, which to me is noticeably lagged.
This picture reflects what I think the latency reporting should perhaps be:
Another note - the latency of novelty slice and novelty feature are reported as the same, but novelty slice has a peak finder that requires a one hop lookahead, so they should probably (to my mind) be different.
OK - I've also now done some testing on the novelty slice object and I'm convinced it is wrong - if I compensate externally for the window size I get much better matches to segments.
I'd be happy to do a PR for the novelty objects with what I think they should be if @weefuzzy or @tremblap (or anyone else) would want to look at it.
[Detailed context - I'm chopping vocal samples against mfccs and with the compensation I have a sample where the second slice is a coherent short Fricative consonant, whereas without compensation it sounds like a more arbitrary chunk of sound - that makes sense if the output is misaligned]
I'm totally prepared to believe it's off by a hop or so. I think we spent some time in lockdown looking at this, and G came burrowed into it some. Things are made harder for long suffering client authors at the moment by needing to account for the latency of the windowing process themselves, which is something I plan to fix one day. IAC, thanks for looking and a PR to prod would be most welcome.
Well - it's off by a window size for slicing and a window size + hop for feature.
I have something here I am just checking through and I'll put it on a branch and PR.
The onset objects also need a look, but I haven't had a chance to delve deep enough to know what I think makes sense there.
In both the realtime and non-realtime wrapped cases? It's quite possible there are multiple sources of error...
I haven't checked the realtime case in any explicit manner, but what I've corrected on the PR is just about latency reporting based on what seems to make sense which is at the RT level and seems to improve NRT results. I think I've got that right for the novelty stuff, but I'd need to look again at the onset things (if I was guessing I'd say that it compensates the hop where it should be the window, but that's a hunch and without having looked through in more detail it seems unwise just to assume and do a PR).
The initial point of comparison is my object that takes descriptors in buffers and then does the novelty feature on them and I worked back to the novelty slicer.
I should say that I also checked the NRT wrapper compensations and whilst they are confusing (to me) in terms of how things are expressed) I do believe them to be correct.
Possibly of interest to @tremblap to look at.
Here are two screenshots of the output of fluid.bufonsetnoveltyfeature~. These are aligned with the source buffer, although there are some complexities due to padding (both of these taken with the default half window added), which means the exact alignment may stretch across the buffers. I still don't have it clear in my mind how timing should relate in relation to the padding, but I think that's somewhat a separate issue to the basic latency one.
This is the current dev build
This is a custom build in which I set the latency to the window size and not the hop size.
To my mind the second of these is better aligned with events in the buffer. I have set the window size to 16384 here to make the difference more obvious. There are still potential arguments to make about what is accurate in terms of where to place onsets in relation to frames, but I've also done audio testing and to my mind the latter version hits audible onsets, whereas the dev build sounds a bit arbitrary. Happy to provide test patches for this or the slicer (although I need to investigate that more) to demonstrate the difference.
In my case it might be relevant that I happen to be doing peak interpolation, so it's just occurred to me that my onsets can occur between/within frames when I audio test.
ok I've spent some quality time with this, and it is strangely not better - I have bad memories of investigating this already. Here is my favourite test file for novelty: alternating from a single sine to a pink noise at exactly 44100 samples in:
x = {Select.ar(LFPulse.ar(0.5,mul: 0.5, add: 0.5),[PinkNoise.ar(0.1),SinOsc.ar(110,mul: 0.2)])}.asBuffer(2)
x.play
then I run a combinatorics of settings, for pleasure, on both versions
(
~slices = Buffer.new(s);
Routine{
var wins = [4096, 1024,128];
var thresh = [0.45, 0.35, 0.01];
var ols = [2,4];
var filts = [1,5,9];
var kerns = [3,21,31];
wins.do{|win, i|
ols.do{|ol|
filts.do{|filt|
kerns.do{|kern|
FluidBufNoveltySlice.processBlocking(s, x, threshold: thresh[i], indices: ~slices, algorithm: 1, kernelSize: kern, filterSize: filt, windowSize: win, hopSize: win / ol);
s.sync;
"win % ol % filt % kern % = ".format(win,ol,filt,kern).post;
~slices.loadToFloatArray(action: {|slice|slice.asInteger.postln});
}
}
}
};
}.play
)
and I get different inadequate values between versions :) I'm sure @weefuzzy will have opinions, as one who investigated this too, and gazed for a lot of time at novelty curves.
One question: why is the latency different in the PR between the slicer and the curve? I was under the impression that we were aligned...
anyway, send along thoughts and test files and we can continue this adventurous discussion.
It's a bit vague to talk about "different inadequate versions" - the slicer can only find slices on hops, so there is an inevitable inaccuracy in the found slices - the questions for me are:
On the second of these I am fairly confident that my work in this thread identified a conceptual error in the original value chosen and I can run through that on a call or similar, but having described it in written form above in a variety of possibly hard to follow ways, I don't think giving that another go is likely to be that useful so I'd prefer a realtime discussion to go through that.
why is the latency different between slicer and the curve (I presume here you mean feature - not sure if the naming convention has now changed or that is a historical naming)?
That is addressed above:
"the novelty slice has a peak finder that requires a one hop lookahead, so they should probably (to my mind) be different"
The use of quotes relates to the fact that we would need to agree on a model of how these control data objects timings should relate to the input stream and it is possible that that model is different between me and others.
In addressing #148 I have suffered some issues regarding alignment of my results (with two objects) and the original novelty feature object:
I am not yet 100% sure if I think that the model for latency is incorrect in these objects, but I suspect that it may be. My reading for an onset/novelty or similar is that it should be tagged at the start of the comparable window, but right now I think what will be happening is that it is reported at the end of that window. For Mel bands or mfccs the timing is set to the start of the frame. I can't easily force my results to align for consistency as I don't know the window settings by the time the data reaches my object, but it seems to me that doing mfccs externally and then doing a novelty feature should align with doing it all in one object.