Open AnClark opened 10 months ago
This is a Linux perf
stat when testing with REAPER (VST3 edition). I stayed for a little long time on idle.
Here are some screenshots:
Hey @AnClark thank you for the detailed report.
This will take some dedicated time to figure out. I haven't seen this issue before and I can't directly think of what could cause it.
It might be related to the framework we use. Have you used any other DPF based plugins before that show a similar load increase when transport is stopped?
Hmm, so from your perf inspection it seems that the biquad filters take a lot of time on your machine
I'm quickly trying this with the v1.0 release in REAPER on my AMD Ryzen 5 (quite a bit more performant than your ancient C2D). And I don't see any such discrepancies:
Idle:
Active:
Occasionally I see a tiny "jump", when stopping, to 0.03% but it quickly goes down to 0.02% again. Not sure how else I could reproduce.
I am considering to enable SSE4.1 for all plugins this year, which should give a near 4x performance increase. This instruction set is supported for C2D at least. Maybe we can do some preliminary tests to see if this improves this a bit for you.
Btw it seems your perf.data file is incompatible with my system, so I cannot read the output myself.
I'm guessing those visual stats are also an extra feature of that version, which I don't seem to have.
Occasionally I see a tiny "jump", when stopping, to 0.03% but it quickly goes down to 0.02% again.
Not sure how else I could reproduce.
Here's another way you can reproduce the issue:
WSTD EQ still consumes more CPU when I stopped playing.
It's strange that during processing, biquad filter works perfectly. Only if I stopped transport, the filter begins to consume CPU.
Is it possible that any inappropriate samples were being processed by DSP, which made it misbehave?
It might be related to the framework we use. Have you used any other DPF based plugins before that show a similar load increase when transport is stopped?
Yes. I'm porting some LV2 plugins to DPF. Both of the following plugins used to have similar issue:
Both of them have a Moog-style filter. If I stopped transport, the filter will increase the CPU load tremendously. Currently I didn't figure out why it happens, so I just made a workaround: bypass filters if oscillators does not send samples to them.
Here's another way you can reproduce the issue:
1. Add a new track, and load WSTD EQ; 2. Create a new MIDI item, and add JSFX "White Noise Generator" to Take FX; 3. Switch on repeat (activate the "Toggle Repeat" button); 4. Play.
I tried following these instructions. I have a midi section, JS: White noise Generator
, then VST3: WSTD EQ
. playing this selection on repeat and playing or not playing it doesn't get beyond 0.03%
I’ve checked __hv_biquad_f()
in generated code.
Seems that newer CPU like your Ryzen 5 enabled solution(s) optimized with AVX or SSE 4.1, while my ancient C2D only supports SSE and SSE2, so it fallbacks to this simple solution:
const float y = bIn*bX0 + o->xm1*bX1 + o->xm2*bX2 - o->ym1*bY1 - o->ym2*bY2;
o->xm2 = o->xm1; o->xm1 = bIn;
o->ym2 = o->ym1; o->ym1 = y;
*bOut = y;
However it's still strange: this solution performs quite well on transport, but CPU load increases when transport stops.
As I said we do not build with SIMD optimizations yet (only on ARM).
Your CPU should support SSE4.1 which I might enable later this year. C2D is about 15 years old now.
You could try this optimization by adding -msse41
to the CXXFLAGS
in the plugin/source/Makefile
.
I have a newer ThinkPad X201 Tablet. It has a Core 1st Gen processor (Core i7 L 640).
I enabled -msse41
, and tested again. Even though SIMD instructions reduced CPU usages by 1.0% on idle, the problem still exists.
Sounds like we have something to do with the algorithm.
For reference, here's a Moog-style filter from RaffoSynth, which has the same problem as I described:
//hace lo mismo que la versión en asm
void equalizer(float* buffer, float* prev_vals, uint32_t sample_count, float psuma0, float psuma2, float psuma3, float ssuma0, float ssuma1, float ssuma2, float ssuma3, float factorSuma2){
float psuma1 = psuma0 *2;
for (int i = 0; i < sample_count; i++) {
//low-pass filter
float temp = buffer[i];
buffer[i] *= psuma0; //psuma0 == factorsuma1
buffer[i] += psuma0 * prev_vals[0] + psuma1 * prev_vals[1]
+ psuma2 * prev_vals[2] + psuma3* prev_vals[3];
prev_vals[0] = prev_vals[1];
prev_vals[1] = temp;
// peaking EQ (resonance)
float temp2 = buffer[i];
buffer[i] *= factorSuma2;
buffer[i] += ssuma0 * prev_vals[2] + ssuma1 * prev_vals[3]
+ ssuma2 * prev_vals[4] + ssuma3 * prev_vals[5];
prev_vals[2] = prev_vals[3];
prev_vals[3] = temp;
prev_vals[4] = prev_vals[5];
prev_vals[5] = buffer[i];
}
}
I got a hint from FalkTX on what could be going on. Can you perhaps try the following?
To the top of WSTD_EQ/plugin/source/HeavyDPF_WSTD_EQ.cpp
add
#include "extra/ScopedDenormalDisable.hpp"
And in the run function set the following:
const ScopedDenormalDisable sdd;
const TimePosition& timePos(getTimePosition());
Rebuild and try again.
@dromer OK. I'll try tonight (BJT), and give you report.
@AnClark you can try this build when it finishes: https://github.com/Wasted-Audio/wstd-eq/actions/runs/7431093334
I got a hint from FalkTX on what could be going on. Can you perhaps try the following?
To the top of
WSTD_EQ/plugin/source/HeavyDPF_WSTD_EQ.cpp
add#include "extra/ScopedDenormalDisable.hpp"
And in the run function set the following:
const ScopedDenormalDisable sdd; const TimePosition& timePos(getTimePosition());
Rebuild and try again.
Great! By adding those lines, and build with -O3
CXX flag, problem resolved. Now CPU usage is about 0.6% on idle.
@AnClark you can try this build when it finishes: https://github.com/Wasted-Audio/wstd-eq/actions/runs/7431093334
I've also tested your build.
Your build has better performance than mine. CPU usage is not beyond 0.5% on idle. So disabling denormal numbers really works.
Cool! thank you for confirming. I guess on older systems as yours this really makes a difference. On my machines I couldn't spot any significant change.
Now comes the question on how to best apply this, as setting this option can potentially break things as well ..
My pleasure!
It would be better if there were any document for ScopedDenormalDisable
. It's the first time I know this API. I wonder if it's proved stable by FalkTX and contributors.
Also you can do more tests on other platforms, including Apple Silicon. All of my machines are not newer than Core i5 5th-Gen.
I do not own any Windows or MacOS machines, so doing "proper" testing on those is not possible. What I generally do is pass builds to friends and ask them to report if it works :shrug:
Btw the only documentation for this class is in the code: https://github.com/DISTRHO/DPF/blob/main/distrho/extra/ScopedDenormalDisable.hpp
I've found a solution: add a new entry in HVCC JSON metadata (e.g. dpf.enable_denormal_number_fix
or other better name), to control whether to enable this fix or not. So we can only apply this fix on WSTD EQ, and let other products uneffected.
What's more, we can also provide 2 builds of WSTD EQ since next release. One applys this fix, and the other one keeps as-is.
I don't see any reason to provide two completely separate builds of the same plugin, that doesn't make any sense. Either such a patch will be in place, or it won't.
Having it as a configurable option in the json is a nice idea, so it won't be put there automatically for all DPF builds. I'd like to know more about the implications of the patch and how it could disrupt plugin and host behavior before moving forward with a permanent solution.
Maybe I can help test on Windows (as well as Wine). I have a Hewlett-Packard Pavillion with Windows 11 and Msys2 installed (though it uses i7-5500U).
What's more, if WSTD and HVCC had unit test (or benchmark test) it would also help a lot.
HVCC does have some testing in place (although not everything works), but that's a discussion for a different project :)
So how could we do tests? Maybe we can make a roadmap for testing plugins (maybe not limited to WSTD EQ). For example, specify test cases and target DAWs.
Hi Wasted Audio Team,
I've encountered a strange issue when using WSTD EQ on REAPER for Linux. If the plugin is processing audio, CPU usage is below 1.0% on average. However, when I click "Stop" on REAPER, CPU usage will terribly increase to 7.0%.
See the following screenshots:
My system environment: