Merge samples on stop change for better memory use and lower latency

GrandOrgue / grandorgue

GrandOrgue software

Other

148 stars 40 forks source link

Merge samples on stop change for better memory use and lower latency #1796

Closed rhpvorderman closed 4 months ago

rhpvorderman commented 4 months ago

Hi, first of all thanks for GrandOrgue! I built my VPO and I am so happy that I can finally learn to play with my feet. Thanks for making this possible.

I'm a bioinformatician and as such a deal with lots and lots of data (one sample can be 300GB) and have the need to compute them efficiently. Because of that I have now gained experience with writing software that makes very efficient use of resources. I would love to help out GrandOrgue as a way to say thanks.

One thing I noticed about VPO software is that the resource requirements quite steeply rise with the complexity of the organ. Hardcore users typically have PCs that go beyond the normal consumer range in terms of memory. This makes sense since every pipe is sampled individually.

This morning I realised that a lot of compute and memory can be saved by pre-calculating on a stop change. Now when I press a key, all the required samples are played simultaneously and are merged by the audio backend in a single soundwave (per channel). However, it would be much more efficient to pre-calculate this for each key on a stop change. That way the number of samples that need to be stored in memory is equal to the number of keys in the organ. This will introduce some latency when pulling out stops, but it will make playing notes have a lot less latency, as only one (multi-channel) sample is played per note. It will also massively lower memory requirements, as memory is determined by the number of organ keys, not by the number of organ pipes.

rousseldenis commented 4 months ago

First I want to thank you for the interest you have in GrandOrgue.

This will introduce some latency when pulling out stops,

I would say that point can be tedious to implement as in real life, stop changes can occur quickly during playing (call a memory). You can't require player to wait for the stops to having really changed before continuing playing. And you can't really predict how much time will be required to retrieve the stops changes (without high computation - with maybe a maximum retrieval time). For those reasons I feel that could be highly complex (CPU usage) and highly random (you rely on disks I/O - even with M.2).

@larspalo @oleg68

rhpvorderman commented 4 months ago

I would say that point can be tedious to implement as in real life, stop changes can occur quickly during playing

You are right. I hadn't properly considered this. I think the computation part can be solved, but the disk latency is going to be problematic (if the entire windchest is a few GB, that will simply take a few seconds to load, even on a SSD). So then you need to put everything into memory anyway, and then there is no benefit at all.

This change would be great for people who change stops only between pieces, but changing during the piece would be unacceptable, and as such it wouldn't be a good organ simulation anymore. So it is a complete non-starter.

larspalo commented 4 months ago

a lot of compute and memory can be saved by pre-calculating on a stop change

Actually, no. I highly doubt that it really would be more efficient at the same level of subtle detail rendering possible. If every possible combination of stops (perhaps each with multiple attacks and releases available) would be pre-calculated and merged into "new" samples for every possible key/timing/press the memory requirement would indeed be huge! Just pre-calculating the currently active stops is not really an option to be able to realistically model a pipe organ as it works.

What GO instead does is to store all chosen/available samples in memory (RAM) and depending on what stops the user activates and keys the user play moves the appropriate samples into output audio stream(s) mixing as needed (and the number of channels or even devices can vary also depending on the user choice and his/her setup and whims).

GO is a specialized sample player intended to model real pipe organs, it's not a synthesizer that have one "ideal" sound created for each "patch". It's not acceptable that there should be any wait until a new combination of stops would be stored. When a stop is pulled and a key is held - it should sound...

But if you have other ideas that can benefit the project, don't hesitate to express them!

rhpvorderman commented 4 months ago

@rousseldenis @larspalo Thanks for your thoughts

But if you have other ideas that can benefit the project, don't hesitate to express them!

I am a bit hesitant since I am completely new to virtual pipe organ software, and this field is not entirely new. As a result, many ideas I may have, have probably been had before, but not implemented for practical purposes. Q.E.D. this issue.

However there is one other idea. And that is using lossy compression to reduce the memory requirements. I have researched this a bit, and most codecs are not able to provide low-latency decompression, which is probably the reason this has not been done before. However the relatively recent Opus can support low-latency (5ms) decompression on the high bit-rate algorithm. I would think that 256kbit Opus compression would be indistuingishable from native WAV to the human ear. At least for a single pipe. I wonder if it would work for multiple pipes however, with all the signals merging there might be something missing. This would be interesting to try out.

I can program in C, however I have very little hands-on with C++. So trying out something like the above on a big unknown (for me) code base such as Grandorgue would be a little bit foolish. So I think it is better if I familiarize myself with the code base a bit (by a small PR here and there if I see something), before I attempt something like that.

hnb2907 commented 4 months ago

Hi all,

Just to say, I agree with @larspalo reply on this one. I am not an expert, but I understand his reasoning about the technicality and stop-change delays, and it would make it unrealistic to play a larger organ.

To be honest, on Friesach and my Barton hybrid, both of which are probably fairly large/complex, I've not seen any problems with the current implementation. Even on an old Dell 5070 with Celeron J4105 32GB RAM, using ubuntu studio lowlatency kernel and JACK, the CPU usage is always really low, and the overall latency of key press->MIDI->GO->audio is perfectly acceptable and comparble to a real instrument, even for fast pieces of music.

So it was for me an interesting idea, but maybe not worth the development.

Then I had another thought :) I am similar to @rhpvorderman, but my skills lie in a different place. I have a bit of programming experience in various languages, but not C++. To be honest, the developers here have been excellent with some comments and suggestions from me so far :)

I noticed that @rhpvorderman has skills in signal processing. A few weeks ago we discussed again https://github.com/GrandOrgue/grandorgue/discussions/922 maybe this is something we could both help on in the future? This seems to have some variable filtering to be applied on the sample audio path.
Personally I think it would be a fantastic and maybe easy upgrade, and will make the sound even more realistic. Unfortunately my programming skills are nowhere near good enough, but I'd be happy to be a beta tester or help any other way I can :)

Cheers, Chris.

rhpvorderman commented 4 months ago

I noticed that @rhpvorderman has skills in signal processing.

Not really. I am entirely new to this field :). I have experience with processing lots and lots of data with optimized C code. But as you said, the current implementation uses very little CPU, so that is not really of that much use.

oleg68 commented 4 months ago

Current native implementation eats little cpu, but requires a lot of memory.

If we enable compression, we reduce the memory usage but increase the cpu load.

We have some feature requests (ex. bass/treble tuning) that would require much more cpu.

rhpvorderman commented 4 months ago

Current native implementation eats little cpu, but requires a lot of memory. If we enable compression, we reduce the memory usage but increase the cpu load.

Yes, opus compression could be worth it in cases where the current wavpack compression is not enough. On the other hand, systems with very little memory expansion options usually also have quite poor CPUs. I wonder how Opus decoding schales on those CPUs while having to decode a 100 pipes simultaneously for instance.

I think it would be interesting to do some mock testing to see first how many opus samples can be played simultaneously before the CPU is at 100%. That is quite a good lowkey test before any code is changed. I will put that on my fun project to do list.

Thanks all for the welcoming responses. I will be lurking around. In case there is a feature that would be awesome to include, but is giving trouble cpu performance-wise, feel free to ping me. Maybe I can provide a few insights and code that will help the feature to be included.