anira-project / anira

an architecture for neural network inference in real-time audio applications
https://doi.org/10.1109/IS262782.2024.10704099
Apache License 2.0
115 stars 5 forks source link

100% CPU use even on NONE backend. #4

Closed mchagneux closed 1 month ago

mchagneux commented 2 months ago

Hi,

I've built the juce example plugin and tried different backends. My machine is not really new (i7-7820HQ) but I'm still a bit surprised by the CPU usage of the example plugin (practically 100% and the OS freezes after a few seconds of use). Surprisingly this is also the case for the NONE backend. Is this expected behaviour that the NONE backend also consumes CPU ? Just checking to see if I missed something or if it's just my machine that can't handle basic ML workloads on CPU.

Mathis.

faressc commented 2 months ago

Hey,

thanks for reaching out again. No, this is not normal at all. The NONE should run fine on almost all systems. Could you build and run the benchmarks? Especially the advanced-benchmark should give us some insight into the performance. Delete the build folder, configure cmake and rebuild:

cmake . -B build -DCMAKE_BUILD_TYPE=Release -DANIRA_WITH_BENCHMARK=ON -DANIRA_WITH_EXAMPLES=ON
cmake --build build --config Release

Then run the benchmark with:

ctest -R Benchmark.Advanced -VV --timeout 100000

You don't need to run the full benchmark, as that could take quite some time. The beginning should help us. Then it would be great if you could post the results here.

Cheers, Fares

mchagneux commented 2 months ago

Unfortunately the tests are not found, here's what I get:

E:\audio_dev\Playground\3rd_party\anira>ctest -R Benchmark.Advanced -VV --timeout 100000
UpdateCTestConfiguration  from :E:/audio_dev/Playground/3rd_party/anira/DartConfiguration.tcl
UpdateCTestConfiguration  from :E:/audio_dev/Playground/3rd_party/anira/DartConfiguration.tcl
Test project E:/audio_dev/Playground/3rd_party/anira
Constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
No tests were found!!!

Might have to do with the fact that I'm using Ninja on Windows ?

mchagneux commented 2 months ago

I was able to start the advanced-benchmark.exe, and let it run for the first model on all 4 backends. Here are the results:

Running main() from E:\audio_dev\Playground\3rd_party\anira\build_deps\googletest-src\googletest\src\gtest_main.cc [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from Benchmark [ RUN ] Benchmark.Advanced 2024-09-10T16:44:41+02:00 Run on (8 X 2904 MHz CPU s) CPU Caches: L1 Data 32 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 256 KiB (x4) L3 Unified 8192 KiB (x1)


Model: steerable-nafx-dynamic.pt | Backend: libtorch | Sample Rate: 44100 Hz | Buffer Size: 64 = 1.4512 ms


Benchmark Time CPU Iterations

ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time 39.7 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time 36.8 ms 0.625 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time 36.4 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time 34.8 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time 35.9 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time 35.2 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time 36.7 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time 38.9 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time 35.7 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time 37.8 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time_mean 36.8 ms 0.125 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time_median 36.6 ms 0.000 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time_stddev 1.59 ms 0.219 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time_cv 4.31 % 174.80 % 10 ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time_min 34.8 ms 0.000 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time_max 39.7 ms 0.625 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/0/iterations:50/repeats:10/manual_time_percentile 38.9 ms 0.312 ms 10


Model: steerable-nafx-libtorch-dynamic.onnx | Backend: onnx | Sample Rate: 44100 Hz | Buffer Size: 64 = 1.4512 ms

ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time 33.7 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time 34.4 ms 0.625 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time 31.8 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time 34.2 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time 31.1 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time 30.4 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time 34.4 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time 29.8 ms 0.625 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time 31.9 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time 31.3 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time_mean 32.3 ms 0.312 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time_median 31.9 ms 0.312 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time_stddev 1.72 ms 0.208 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time_cv 5.34 % 66.67 % 10 ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time_min 29.8 ms 0.000 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time_max 34.4 ms 0.625 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/1/iterations:50/repeats:10/manual_time_percentile 34.4 ms 0.625 ms 10


Model: steerable-nafx-dynamic.tflite | Backend: tflite | Sample Rate: 44100 Hz | Buffer Size: 64 = 1.4512 ms

ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time 107 ms 0.625 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time 104 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time 92.1 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time 94.7 ms 0.625 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time 91.1 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time 90.3 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time 101 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time 97.9 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time 92.3 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time 86.8 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time_mean 95.7 ms 0.219 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time_median 93.5 ms 0.156 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time_stddev 6.52 ms 0.257 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time_cv 6.81 % 117.61 % 10 ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time_min 86.8 ms 0.000 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time_max 107 ms 0.625 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/2/iterations:50/repeats:10/manual_time_percentile 104 ms 0.625 ms 10


Model: no_model | Backend: none | Sample Rate: 44100 Hz | Buffer Size: 64 = 1.4512 ms

ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time 0.054 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time 0.055 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time 0.054 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time 0.055 ms 0.625 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time 0.054 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time 0.054 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time 0.054 ms 0.625 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time 0.053 ms 0.625 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time 0.054 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time 0.054 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time_mean 0.054 ms 0.312 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time_median 0.054 ms 0.312 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time_stddev 0.000 ms 0.255 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time_cv 0.78 % 81.65 % 10 ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time_min 0.053 ms 0.000 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time_max 0.055 ms 0.625 ms 10 ProcessBlockFixture/BM_ADVANCED/64/0/3/iterations:50/repeats:10/manual_time_percentile 0.055 ms 0.625 ms 10

mchagneux commented 2 months ago

As a side remark, the benchmark wasn't maxxing my CPU when running, reached around 30-40% on average.

faressc commented 2 months ago

Thanks! With the small buffer size of 64 your system is definitely to slow to infer the steerable nafx model with all three backends. While the buffer size of 64 samples with a sample rate of 44.1k gives us 1.4512 ms for processing, your system takes 32.3 ms for the fastest backend onnx in mean. But we can see that the none backend is fast enough 0.312 ms.

But, since this is not the model we use in the juce plugin and the buffersize is very small could you change the following lines in the defineAdvancedBenchmark.cpp :

std::vector<int> bufferSizes = {64, 128, 256, 512, 1024, 2048, 4096, 8192};
std::vector<anira::InferenceBackend> inferenceBackends = {anira::LIBTORCH, anira::ONNX, anira::TFLITE, anira::NONE};
std::vector<AdvancedInferenceConfigs> advancedInferenceConfigs = {cnnAdvancedConfigs, hybridNNAdvancedConfigs, statefulRNNAdvancedConfigs};

to

std::vector<int> bufferSizes = {2048};
std::vector<anira::InferenceBackend> inferenceBackends = {anira::LIBTORCH, anira::ONNX, anira::TFLITE, anira::NONE};
std::vector<AdvancedInferenceConfigs> advancedInferenceConfigs = { hybridNNAdvancedConfigs};

and report back the benchmarks.

Also what buffer size are you using in the juce plugin?

mchagneux commented 2 months ago

Here's the benchmark for GuitarLSTM with a buffer size of 2048. This is the side I was using when running the plugin in JUCE AudioPluginHost with ASIO drivers.

Run on (8 X 2904 MHz CPU s) CPU Caches: L1 Data 32 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 256 KiB (x4) L3 Unified 8192 KiB (x1)


Model: GuitarLSTM-dynamic.pt | Backend: libtorch | Sample Rate: 44100 Hz | Buffer Size: 2048 = 46.4399 ms


Benchmark Time CPU Iterations

ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time 37.8 ms 2.50 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time 38.7 ms 1.25 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time 40.2 ms 1.56 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time 40.2 ms 0.938 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time 41.5 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time 42.0 ms 0.625 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time 38.6 ms 1.56 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time 39.7 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time 40.8 ms 0.625 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time 40.0 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time_mean 40.0 ms 1.00 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time_median 40.1 ms 0.781 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time_stddev 1.31 ms 0.719 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time_cv 3.27 % 71.87 % 10 ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time_min 37.8 ms 0.312 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time_max 42.0 ms 2.50 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/0/iterations:50/repeats:10/manual_time_percentile 41.5 ms 1.56 ms 10


Model: GuitarLSTM-libtorch-dynamic.onnx | Backend: onnx | Sample Rate: 44100 Hz | Buffer Size: 2048 = 46.4399 ms


ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time 21.6 ms 1.56 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time 21.6 ms 0.938 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time 20.9 ms 3.12 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time 21.2 ms 1.88 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time 20.9 ms 1.56 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time 21.2 ms 1.56 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time 21.3 ms 1.56 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time 21.5 ms 0.938 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time 22.1 ms 1.25 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time 20.3 ms 2.81 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time_mean 21.2 ms 1.72 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time_median 21.3 ms 1.56 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time_stddev 0.485 ms 0.725 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time_cv 2.28 % 42.21 % 10 ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time_min 20.3 ms 0.938 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time_max 22.1 ms 3.12 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/1/iterations:50/repeats:10/manual_time_percentile 21.6 ms 2.81 ms 10


Model: GuitarLSTM-2048.tflite | Backend: tflite | Sample Rate: 44100 Hz | Buffer Size: 2048 = 46.4399 ms


ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time 38.6 ms 0.938 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time 39.4 ms 0.312 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time 35.7 ms 1.88 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time 41.4 ms 0.000 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time 38.4 ms 1.88 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time 39.4 ms 2.19 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time 42.0 ms 1.25 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time 38.4 ms 3.44 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time 39.7 ms 1.25 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time 37.4 ms 0.938 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time_mean 39.0 ms 1.41 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time_median 39.0 ms 1.25 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time_stddev 1.82 ms 0.991 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time_cv 4.67 % 70.47 % 10 ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time_min 35.7 ms 0.000 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time_max 42.0 ms 3.44 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/2/iterations:50/repeats:10/manual_time_percentile 41.4 ms 2.19 ms 10


Model: no_model | Backend: none | Sample Rate: 44100 Hz | Buffer Size: 2048 = 46.4399 ms


ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time 1.21 ms 1.25 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time 1.18 ms 1.56 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time 1.20 ms 0.625 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time 1.18 ms 1.25 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time 1.18 ms 1.56 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time 1.19 ms 1.25 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time 1.30 ms 2.19 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time 1.27 ms 1.25 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time 1.27 ms 1.56 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time 1.19 ms 0.938 ms 50 ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time_mean 1.22 ms 1.34 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time_median 1.19 ms 1.25 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time_stddev 0.045 ms 0.418 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time_cv 3.70 % 31.10 % 10 ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time_min 1.18 ms 0.625 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time_max 1.30 ms 2.19 ms 10 ProcessBlockFixture/BM_ADVANCED/2048/0/3/iterations:50/repeats:10/manual_time_percentile 1.27 ms 1.56 ms 10 [ OK ] Benchmark.Advanced (134973 ms) [----------] 1 test from Benchmark (134973 ms total)

[----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (134977 ms total) [ PASSED ] 1 test.

faressc commented 2 months ago

Seems like with 2048 samples you should be able to run the plugin, espacially with onnx and the bypass engine. What buffer size did you try?

mchagneux commented 2 months ago

I was using a buffer size of 2048 with the ASIO driver. I think the problem I'm seeing isn't directly related to audio processing. Windows 10 starts freezing once I open the juce-example-plugin and let it run for a few seconds, so it looks like the message thread is being blocked somehow, or some problem with a lock somewhere. I don't remember having this problem on Linux but I had to switch to Windows. Could it be related to the ANIRA_WITH_SEMAPHORE option set to ON ?

faressc commented 2 months ago

If it's a Windows specific issue, @vackva might be of better help.. It would surprise me, but yes you could try setting the ANIRA_WITH_SEMAPHORE option to off. It would be also helpful if you could start the plugin in debug mode and trace back where the plugin crashes.

mchagneux commented 2 months ago

Ok, will try all this in the coming days!

faressc commented 2 months ago

Cool, thanks!

mchagneux commented 2 months ago

To clarify, there is no crash, just an overall lag of the OS once I start a plugin that involves anira. So this isn't really traceable through Debug. I'll try to profile the program with something like Perfetto, there might be some problem with the message thread on Windows for some reason.

mchagneux commented 2 months ago

Hi again,

Curiously there doesn't seem to be any problem with the nn-inference-template plugin (which comes with anira 0.1.0 if I'm correct). I didn't have the time to investigate further. Will keep you updated if I find the source of the problem.

faressc commented 1 month ago

Hi there, Thanks for checking - I am very curious to find out more! You might also try updating the anira version of the nn-inference-template and see if that causes the same crashes. Anyway, thanks for your debugging already. Unfortunately, I cannot help with debugging since I do not have a Windows system.

mchagneux commented 1 month ago

The problems seems to be directly related to the semaphore. I pulled the latest version of the anira submodule in nn-inference-template, and the problem was still there, unless I set ANIRA_WTH_SEMAPHORE to OFF. In this latter case the CPU usage is very reasonable (sub 20%).

faressc commented 1 month ago

Do I understand it correctly, that the version anira v0.1.0 does not have the problem (in both nn-inference-template and buildt-in juce-plugin-example)?

mchagneux commented 1 month ago

Yes, from what I've seen:

mchagneux commented 1 month ago

I didn't test 0.1.0 with juce-plugin-example, only in nn-inference-template.

mchagneux commented 1 month ago

Actually the problem appears / disappears simply by swapping the anira.dll file from the different versions in the executable folder.

faressc commented 1 month ago

Ok, thank you! We will investigate this further in the next days and let you know when we think to have found a fix. In the meantime it might be best for you to use anira v0.1.2 with the ANIRA_WITH_SEMAPHORE set to OFF. Just for the record, anira v0.1.0 uses semaphores by default - so it must be the implementation of the option without semaphores that caused the bug in the semaphore version...

vackva commented 1 month ago

I just confirmed the issue reported by @mchagneux with the MSVC compiler and generator. I traced the bug to commit 30abde5.

vackva commented 1 month ago

I resolved the issue by reverting timeForExit to the original value of 1 millisecond in InferenceThread.cpp:

void InferenceThread::run() {
    // std::chrono::microseconds timeForExit(500); // new value since 0.1.1
    std::chrono::milliseconds timeForExit(1); // value used until 0.1.0
    ...
}

It appears that using std::chrono::microseconds is causing issues when multiple threads are active. However, when limiting the number of threads to 5 (out of 24 available), std::chrono::microseconds worked without issues.

mchagneux commented 1 month ago

Great, thanks! Will check this out once it's merged into main.

faressc commented 1 month ago

Hi Mathis, We have updated our entire thread synchronization and scheduling architecture with several improvements. Although we only have a timeForExit of 50 microseconds now, everything should run much smoother now. It would be great if you could test this again. We have backported the changes to tag v0.1.2, so you can either try the juce-example-plugin with this tag or the latest main. The nn-inference-template will be updated soon. Thanks for reporting the issue and testing - this was truly helpful! Cheers!

faressc commented 1 month ago

The nn-inference-template has now been updated as well. So I consider this issue solved. Feel free to reopen if you still encounter problems.

mchagneux commented 1 month ago

Hi Mathis, We have updated our entire thread synchronization and scheduling architecture with several improvements. Although we only have a timeForExit of 50 microseconds now, everything should run much smoother now. It would be great if you could test this again. We have backported the changes to tag v0.1.2, so you can either try the juce-example-plugin with this tag or the latest main. The nn-inference-template will be updated soon. Thanks for reporting the issue and testing - this was truly helpful! Cheers!

Just tried it and it's working perfectly now, thanks for the fix!