amiaopensource / astataudit

Summarizing audio metrics via ffmpeg and bwfmetaedit
MIT License
3 stars 1 forks source link

Factor out amplitude in the Correlation (via phase) filter reporting. #10

Open Soundmatters opened 1 year ago

Soundmatters commented 1 year ago

Factor out amplitude in the Correlation (via phase) filter reporting. As it stands now, level offsets between channels impact the reporting of ffmpeg’s phase filter. A test file is available here: https://drive.google.com/drive/folders/1PieIpN5w_IvTzfaRYoiPmJJCXlHyZXx8?usp=share_link

dericed commented 1 year ago

didn't this happen in https://github.com/amiaopensource/astataudit/commit/4bc596da05eb7cb1285207ed7381526e3c4b6b89?

Soundmatters commented 1 year ago

didn't this happen in 4bc596d?

No. That solved the problem of DC offset in the signal skewing the phase filter data. This issue is that amplitude offsets between channels skew the phase filter data.

Soundmatters commented 1 year ago

In the folder of test files (linked above) there are two pngs of two 2-channel mono recordings that illustrate the level/correlation issue. They are the same program, but one version was made with a channel offset of .6 dB and the other has a channel offset of 5 dB: •REC0078_T2869 WITH ONLY point06 DB LEVEL OFFSET.wav.astatsaudit ; OVERALL CORRELATION VALUE = + .96 •REC0078_T2869 WITH 5 DB LEVEL OFFSET.wav.astatsaudit ; OVERALL CORRELATION VALUE = + .82

Soundmatters commented 1 year ago

Here is an additional file which may illustrate the issue better: https://drive.google.com/drive/folders/1j0sdy_byuBzaHmrtsrKV7XdUZkm1TY0k?usp=sharing

dericed commented 1 year ago

Hi @Soundmatters, I've tested this a few ways with having loudnorm and ebur128 put a rolling normalization before the phasemeter test, but the results are messy compared to the axcorrelation graph which we had removed. With the sample you shared, it seems well correlated but has amplitude differences, so I took one channel and offset some of the samples for a few minutes so I could force a loss of correlation. I eventually I added the axcorrelation back in to test that against other methods. From there I wanted to check back to the version of astataudit before axcorrelation was removed and found it was different than my current work in progress.

Here's the graph just before axcorrelate was removed in https://github.com/amiaopensource/astataudit/commit/eaf656a7546c9c2486c38c58d802177904098608. This is on your sample with a section of audio in a single channel offset to force a correlation issue.

astataudit_phase_test_file_scoot wav astatsaudit

Perhaps this was due to the work of the preceding commits, but I see the axcorrelation graph in this commit was problematic. It shows the correlation but the x-axis is halved.

In my work in progress it looks like: astataudit_phase_test_file_scoot wav astatsaudit2

So here the axcorrelation aligns well with the phase graph. They both show the issue but the phase graph factors in amplitude whereas the axcorrelation one doesn't.

So this all has me trying to remember why the axcorrelation graph was dropped as I haven't found anything better than it for plotting phase correlation without factoring in amplitude. Beyond being more accurate it also is much faster than adding in a rolling normalization step before the phasemeter analysis.

dericed commented 1 year ago

Hi @Soundmatters, I think you replied via email rather than at https://github.com/amiaopensource/astataudit/issues/10, so the image attachments didn't come through.

Soundmatters commented 1 year ago

So the way I remember it, we dropped axcorrelation in the reporting because --aside from analysis of audio files with an amplitude offset between channels-- aphasemeter was providing more accurate information; the x-axis issue that you note may very well have been part of that problem. Here are a few examples of unreliable reporting that seem unrelated to x-axis issue though:

1) axcorrelation was reporting values greater than 1; many instances of this in the first 5 minutes of this example 2) in the opening minute in this example, there is a 30-second Dolby A tone (uncorrelated by design) followed by a 30-second 1 k sine wave (which is well-aligned/in-phase in this example); aphasemeter reports accurately, axcorrelation does not

WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav astatsaudit

And here is another example of very different reporting. The aphasemeter data here tracks very closely with some commercial software that I use

REC0078_T3423 wav astatsaudit

Soundmatters commented 1 year ago

Please note that the first image that I posted yesterday was not correct; it has been replaced with the correct image.

dericed commented 1 year ago

Hey @Soundmatters, with my latest branch here is a graph of the output, including a revised contextualization of axcorrelate so you can compare the before and after.

1 axcorrelation was reporting values greater than 1; many instances of this in the first 5 minutes of this example

In the new one there's no values over 1. From a skim, I see 0.999989 but no 1's.

WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav astatsaudit1

in the opening minute in this example, there is a 30-second Dolby A tone (uncorrelated by design) followed by a 30-second 1 k sine wave (which is well-aligned/in-phase in this example); aphasemeter reports accurately, axcorrelation does not

I reran this process with a -m 2 to get the first 2 mins in their own graph. Here's that: WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav astatsaudit 1

In the new version the aphasemeter values and axcorrelate values are roughly similar. Let me know what you think, if it seems okay, I can merge into a new release for your testing.

Soundmatters commented 1 year ago

Fantastic!

If you don’t think that it would slow down astataudit’s processing time too much, could we leave both the “Correlation” graphic and this updated “Normalized Cross Correlation” in for the time being? It would give us the chance to compare their analysis over large sets of data. And if you could add the same color scale to the “Normalized Cross Correlation” graphic, that might help with the comparison, though not at all essential if you are out of time. Thanks.

dericed commented 1 year ago

Hi @Soundmatters, here's the 2 minute sample with the color patterns matched.

WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav astatsaudit 1

But you're right, there is a notable speed difference. I ran this on WNYC-NSDS-1987-05-22-32942.6 0027 Show Music From France.wav and compared the last release to my current draft and it's 5:04 for the current draft, and 0:33 for the last release. The axcorrelate does manage a fast and slow algorithm and I was using the slow one, with the fast it takes 1:12 and the graph looks like this. The difference between fast and slow is detailed at https://ffmpeg.org/ffmpeg-filters.html#axcorrelate.

WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav astatsaudit 1

I should note that between the last release and the current draft, the axcorrelation isn't the only addition but there's the spectrum as well.

Soundmatters commented 1 year ago

The default/slow setting certainly seems more accurate.

I’m not sure how helpful this would be, but if dropping some of the other filter graphics would lighten the processing load, I’d say: 1) keep axcorrelate in the default/slow mode, 2) drop zero crossings for now, 3) drop the spectral analysis for now.

The correlation reporting is so important and useful that its accuracy is a primary concern for us.

Maybe, down the road, zero crossings and spectral analysis could be added as options.

dericed commented 1 year ago

Hey @Soundmatters, this is a bit of an investment for future work, but I refactored the way the graph is constructed to separate each analyzer (they were all tangled together before). This should make it a lot easier for me to scale it to add future analyzers.

So with this I can turn them on/off and benchmark. So if I only use one at a time:

astats (without reset) which is for dcoffset 28 seconds

WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav astats

astats (with a reset every frame) 30 seconds

WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav astatsreset

aphasemeter 52 seconds

WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav aphasemeter

axcorrelate (slow) 9:42 WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav axcorrelate slow

axcorrelate (fast) 29 seconds WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav axcorrelate fast

showspectrumpic 34 seconds WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav showspectrumpic

all (with fast axcorrelation) 3:13 WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav all fast

all (with slow axcorrelation) 9:07 WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav all slow

These were just quick single run tests but obviously something was off as the solo run of slow axcorrelation was slower than that with all the others.

Soundmatters commented 1 year ago

I'm wondering about the last graphic in the previous post "all (with slow axcorrelation) 9:07" Is that correct? The Normalized Cross Correlation data looks inaccurate (see the uncorrelated Dolby Tone in the first 30 seconds displaying as almost +1; also other data looks like it is displayed at half its value).

dericed commented 1 year ago

I'm having trouble understanding the comment. You suggest the last graphic is incorrect, but is there one here that is correct?

Soundmatters commented 1 year ago

The Cross Correlation analysis in the two-minute examples that you posted seems accurate. I’m looking at the first 30 seconds of Dolby tone (which should be uncorrelated) followed by 30 seconds of a 1 k sine wave (which should be correlated).

The latest graphic that you posted, “(with slow axcorrelation) 9:07", that Dolby tone looks almost perfectly correlated in the Cross Correlation graphic; we’d expect it to be almost 0, not almost +1.

dericed commented 1 year ago

Here's the sample set again but just with 2 minutes.

axcorrelate only, slow

WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav astatsaudit 1 axcorrelate slow

axcorrelate only, fast

WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav astatsaudit 1 axcorrelate fast

all filters, axcorrelate slow

WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav astatsaudit 1 all slow

all filters, axcorrelate fast

WNYC-NSDS-1987-05-22-32942 6 0027 Show Music From France wav astatsaudit 1 all fast

Does this still demonstrate the issue you were mentioning?

richardpl commented 1 year ago

Note that I added 'best' algorithm for axcorrelate filter, it should be more correct than 'fast' algorithm at similar speeds of 'fast' algorithm, but may give you some wrong results compared to 'slow' one especially when used with float sample format. Anyway you should always use double floating point sample format with this filter due limited precision of floats, by using aformat=dblp prior to axcorrelate, in that case it will be more correct for 16-32 bit inputs almost always even with bigger window sizes.

Soundmatters commented 10 months ago

The speed has improved a great deal, but I wasn't able to get any improved aphasemeter or axcorrelate filter reporting with the recent development build.

I don’t have the skills to do this but, if someone is willing to experiment, it might be worth pushing axcorrelate’s “size” parameter to see if we can get more accurate reporting that way. The range it allows is 2 to 131072. If I’m reading the script correctly, the filter size is set at 1024 now. I think that it would be worth experimenting with the upper end of the range (like 32768 or larger). If we can improve the accuracy that way, it might be worth sacrificing some of astataudit’s improved processing speed.

Soundmatters commented 10 months ago

I created a new test file, found here, which I think illustrates the aphasemeter (called Correlation on the png) and axcorrelate (called Normalized Cross Correlation on the png) reporting issues better than other files that I made in the past. Here is the layout of the test file’s audio data:

Section 1 Timeline: 00:00 - 03:18 ; Nine sine waves each at ~ +1 phase, -18 dBFS (+/- .5 dBFS) Timeline: 03:18 - 03:42 ; One Dolby A tone ~ 0 phase, -18 dBFS (+/- .5 dBFS) Timeline: 03:42 - 10:00 ; music with stereo soundfield ~ +0.1 to 0.7 phase, variable amplitude Timeline: 10:00 - 11:00 ; silence

Section 2 Timeline: 11:00 - 21:00 ; a repeat of the content from 00:00 - 10:00, but with a -5 dBFS offset in channel 2 only, the amplitude of channel 1 remains unchanged from Section 1 Timeline: 21:00 - 22:00 ; silence

Section 3 Timeline: 22:00 - 52:00 ; pink noise, +1 phase throughout ; channel 1 is at ~ -8dBFS throughout, channel 2 starts with 2 minutes at ~ -8 dBFS followed by variable amplitude offsets as low as ~ -21 dBFS.

I've included the astataudit reports for the full file, with the audio file itself, in the link above. I’ve attached a detail of the png report here; it seems like the clearest illustration so far of the problem.

DETAIL_REC0078_T2869 WITH 5 DB LEVEL OFFSET wav astatsaudit
richardpl commented 10 months ago

Just run axcorrelate filter on that .wav file through direct showwaves filter output, and in dolby A tone section and its amplitude goes up/down (because its measuring it per each sample). Perhaps this graph picks max values instead of mean ones in certain timeline window, that is main reason why results are incorrect.

dericed commented 10 months ago

Hi @Soundmatters, yes @richardpl's clue helped me here. I had been plotting the max level value of the axcorrelated output.

Here is the current state of the output of the aphasemeter filter alongside the current axcorrelate output which relies on the max value:

20231218_astataudit_test wav astatsaudit Max_level

And here is that same data but plotting the DC Offset of the axcorrelate output, rather than the Max level.

20231218_astataudit_test wav astatsaudit DC_offset

And, as I was curious, here's the min level.

20231218_astataudit_test wav astatsaudit Min_level

And what it looks like when all 3 are plotted together: 20231218_astataudit_test wav astatsaudit

Does switching the plot of the axcorrelated data from the max value to the dc offset resolve the issue for you @Soundmatters?

Soundmatters commented 10 months ago

It certainly seems resolved from this example. Thank you @dericed, and thanks to @richardpl for your insight.

Very interesting and helpfull to see the max., mean, and min. values plotted together. Thanks for adding that example.

richardpl commented 8 months ago

Note that I had made big changes in Librempeg version of axcorrelate filter, the math behind slow and best modes should give more correct results than before, and also faster, because unnecessary float divisions (slowed calculations and hurt precision) have been removed.