Add benchmarks for latency and speed

bogdanteleaga commented 1 year ago

This PR is for issue https://github.com/QosmoInc/neutone_sdk/issues/40.

The idea is to sweep a range of sample rates and buffer sizes and print the RTF and latencies at each combination.

I'll also add some documentation later, but for now I'm putting it up to get some thoughts from @christhetree on how it looks right now. One odd thing is that to accept variable number of arguments with click, I found this multiple=True flag, but the way it's used is you need to pass them in multiple times separately. For example --sample_rate 44100 --sample_rate 48000 etc.

Example outputs:

$ python -m neutone_sdk.b enchmark benchmark-speed --model_file choir.nm --number 30 --sample_rate 48000 --sample_rate 44100
Running benchmark for buffer sizes (128, 256, 512, 1024, 2048) and sample rates (48000, 44100)
Sample rate:  48000 | Buffer size:    128 | duration:   0.07±0.01 | 1/RTF:   1.14
Sample rate:  48000 | Buffer size:    256 | duration:   0.14±0.02 | 1/RTF:   1.14
Sample rate:  48000 | Buffer size:    512 | duration:   0.27±0.02 | 1/RTF:   1.17
Sample rate:  48000 | Buffer size:   1024 | duration:   0.56±0.02 | 1/RTF:   1.13
Sample rate:  48000 | Buffer size:   2048 | duration:   1.10±0.02 | 1/RTF:   1.16
Sample rate:  44100 | Buffer size:    128 | duration:   0.07±0.00 | 1/RTF:   1.18
Sample rate:  44100 | Buffer size:    256 | duration:   0.15±0.01 | 1/RTF:   1.16
Sample rate:  44100 | Buffer size:    512 | duration:   0.30±0.01 | 1/RTF:   1.16
Sample rate:  44100 | Buffer size:   1024 | duration:   0.61±0.02 | 1/RTF:   1.13
Sample rate:  44100 | Buffer size:   2048 | duration:   1.32±0.09 | 1/RTF:   1.06

$ python -m neutone_sdk.benchmark benchmark-latency --model_file choir.nm --sample_rate 48000 --sample_rate 44100
Native buffer sizes: [2048], Native sample rates: [48000]
Model choir.nm has the following delays for each sample rate / buffer size combination (lowest first):
Sample rate:  48000 | Buffer size:   2048 | Total delay:   2176 | (Buffering delay:      0 | Model delay:   2176)
Sample rate:  48000 | Buffer size:   1024 | Total delay:   3200 | (Buffering delay:   1024 | Model delay:   2176)
Sample rate:  48000 | Buffer size:    512 | Total delay:   3712 | (Buffering delay:   1536 | Model delay:   2176)
Sample rate:  44100 | Buffer size:    128 | Total delay:   3909 | (Buffering delay:   1920 | Model delay:   1989)
Sample rate:  48000 | Buffer size:    256 | Total delay:   3968 | (Buffering delay:   1792 | Model delay:   2176)
Sample rate:  44100 | Buffer size:    256 | Total delay:   4044 | (Buffering delay:   2048 | Model delay:   1996)
Sample rate:  44100 | Buffer size:    512 | Total delay:   4044 | (Buffering delay:   2048 | Model delay:   1996)
Sample rate:  44100 | Buffer size:   1024 | Total delay:   4046 | (Buffering delay:   2048 | Model delay:   1998)
Sample rate:  44100 | Buffer size:   2048 | Total delay:   4046 | (Buffering delay:   2048 | Model delay:   1998)
Sample rate:  48000 | Buffer size:    128 | Total delay:   4096 | (Buffering delay:   1920 | Model delay:   2176)
The recommended sample rate / buffer size combination is sample rate 48000, buffer size 2048

christhetree commented 1 year ago

Left some comments, but looks like a helpful tool to add the SDK. We should include it at the end of every example model so that users are encouraged to benchmark their models after wrapping.

bogdanteleaga commented 1 year ago

The delay reporting I think we can easily do during the model wrapping. But the speed benchmark could take too long if we do it with too many combinations (or even at all for some heavy models). Maybe I'll add it with only one SR/BS combination and with a flag to turn it off.

christhetree commented 1 year ago

The delay reporting I think we can easily do during the model wrapping. But the speed benchmark could take too long if we do it with too many combinations (or even at all for some heavy models). Maybe I'll add it with only one SR/BS combination and with a flag to turn it off.

That seems reasonable

bogdanteleaga commented 1 year ago

Addressed the comments, the outliers look like this now. I also added a short explanation at the top.

INFO:__main__:Running benchmark for buffer sizes (128, 256, 512, 1024, 2048) and sample rates (48000, 44100). Outliers will be removed from the calculation of mean and std and displayed separately if existing.
INFO:__main__:Sample rate:  48000 | Buffer size:    128 | duration:  0.012±0.002 | 1/RTF:  6.877 | Outliers: [0.006]
INFO:__main__:Sample rate:  48000 | Buffer size:    256 | duration:  0.021±0.003 | 1/RTF:  7.595 | Outliers: []
INFO:__main__:Sample rate:  48000 | Buffer size:    512 | duration:  0.042±0.003 | 1/RTF:  7.665 | Outliers: []
INFO:__main__:Sample rate:  48000 | Buffer size:   1024 | duration:  0.084±0.002 | 1/RTF:  7.599 | Outliers: [0.077]
INFO:__main__:Sample rate:  48000 | Buffer size:   2048 | duration:  0.170±0.003 | 1/RTF:  7.522 | Outliers: []
INFO:__main__:Sample rate:  44100 | Buffer size:    128 | duration:  0.012±0.001 | 1/RTF:  7.434 | Outliers: [0.014]
INFO:__main__:Sample rate:  44100 | Buffer size:    256 | duration:  0.023±0.001 | 1/RTF:  7.583 | Outliers: [0.027]
INFO:__main__:Sample rate:  44100 | Buffer size:    512 | duration:  0.046±0.002 | 1/RTF:  7.648 | Outliers: [0.051]
INFO:__main__:Sample rate:  44100 | Buffer size:   1024 | duration:  0.088±0.003 | 1/RTF:  7.875 | Outliers: []
INFO:__main__:Sample rate:  44100 | Buffer size:   2048 | duration:  0.183±0.004 | 1/RTF:  7.624 | Outliers: [0.194]

I also replaced the existing latency computations on export with the new one and added a line suggesting to users to benchmark speed as well. I'm still not 100% sure about adding it by default or similar to what I described above, what do you think?

Also, should I look into somehow bringing in your profile tool under the same CLI structure?

christhetree commented 1 year ago

Thanks! I think we should enable both benchmarking as parameters in the save_neutone_model that are true by default, and then they can be disabled manually if it's taking too long. So for example on the rave models we can set the speed benchmarking to false in the example script. We could also disable benchmarking when submission=False.

And yeah that would be super helpful if you unify the profiling script with the benchmarking CLI!

bogdanteleaga commented 1 year ago

Ok, I enabled benchmarking by default for both speed and latency. I think it's more of an issue with non-realtime models, we don't have too much to worry about for now.

Also wrt the profiling I tried moving it to the benchmark file and creating a CLI wrapper with the same structure around it, but I'm not sure it makes that much sense. Would you run separate profiler runs for different sr/bs combinations? Or it's better to run multiple combinations for the same profiling run?

christhetree commented 1 year ago

Yeah profiling is done typically for one fixed SR and BS, it could be selected based on the most perfomant combination found during benchmarking actually. But I think this looks good for now, next time I mess with the profiling I'll make any changes that are needed.

Neutone / neutone_sdk

Add benchmarks for latency and speed #48