garygru / yummyDSP

An Arduino audio DSP library for the Espressif ESP32 and probably other 32 bit machines
GNU General Public License v3.0
108 stars 21 forks source link

Improvements. #19

Open copych opened 10 months ago

copych commented 10 months ago

Hello, @garygru , it's a great library! Do you still support it? I ask because I have some ideas, but I am not a real coder, and my manipulations will spoil the beauty of your code )
Ideas are: First, most obviously, there should be more effects to choose from, say, reverb is a must-have, chorus, flanger, etc. are highly welcome, and there are a lot of them around github already implemented as classes or as sets of functions with permissive licenses, mostly sample-wise, so require quite a little changes. (I could make these contributions). Second, I have tried to implement some optimizations, like faster floating asm routines, exchanging divisions to multiplications where possible, and using of lookup tables for some common cases (sin(), cos(), tanh()). But (in my understanding) for the time being the library provides separate classes for the i2s, dsp and for the nodes, so I couldn't share lookup tables and functions library-wide. The lack of coding skills makes me think it's not possible with just including updated dspHelpers.h, but either requires passing pointers to the lookup tables to the newly created instances, or making some host factory class which would do that and something else silently. And the third, I liked very much that you can form the fx-chain dynamically, but for now there's no way to remove or change the set or the order of the chain. Probably this was a ToDo ) I understand that for one's particular needs one could just include your library files and edit them, but this library is worth developing as a library to my mind. Please, what do you think on the matter? Cheers!

PS I forgot one important thing. WROVER modules include 4/8MB of PSRAM which is a great solution that will let the effects to use almost any settings in the matter of timing.

garygru commented 10 months ago

Hey @copych, I don't find the time to actively extend the library but you're welcome to add code.

My opinion:

Whenever you have something send a pull request and I will have a look at it and merge.

Thanks, Gary

copych commented 10 months ago

Hello! Thanks for the quick and detailed answer. I completely understand the lack of time for hobbies and the shift in focus to something else. This is something I struggle with myself. With your permission, a small comment in continuation of the previous comment. I don’t want to start any kind of debate here, but if we are talking about audio processing, then it would be reasonable to assume that the person dealing with this topic will take care of having additional memory, because it is obvious that such effects as reverb, delay, shifting and other manipulations require RAM . Plus, the ESP32 S3, which has essentially replaced the ESP32 as the number one choice today, has at least a couple of megabytes of PSRAM by default. And lookup tables would take only ~400 bytes, taking into account linear approximation, which makes it possible to reduce the stored data for smooth functions to only 32 float values per function, according to what I have dealt with. And for the realtime app CPU cycles are more costly. Anyway, I am in no way insisting on anything, I just wanted to clarify my original position, thank you!

garygru commented 10 months ago

Yes you are right. One would want to choose the ESP32 with more RAM. I normally use bigger LUTs but when you have good experience, go for it! I can help with implementation. A header with the data points should do it. Something like static const float sineTable[32] = { -1,..., 0, ...};

copych commented 10 months ago

Thanks! I'll try )

copych commented 10 months ago

Yes, precalculated tables worked ok. I've made a pull request, cause basic functionality is there already. I'm planning to add some more effects, but now I'd like to ask you about the routing in this DSP. Generally, there're two questions. First is how fast is the 'list'? Forgive my ignorance, I have never used lists, vectors, etc. before in time-critical applications. You know that 'list' iterator is being called several times per sample, so the time adds up and multiplies. Yet, there seems to be no drops @48000/32 bit. Second, if the 'list' is ok, than it should be possible to slightly rearrange the DSP core, so that .add() method also had a chainId argument, and we could split processing into several (maybe fixed quantity) chains allowing almost any serial/parallel combination of the Nodes. (Afterall there could be 'send-return's and a side-chain selector for the compressor node). PS. I have modified AudioDriver slightly, but have no hardware besides external DAC and an i2s mic to test it in different modes. With my setup it works ok with the default settings (48k/32bit), but I wasn't sure about the

//  .fixed_mclk = fs * mclkFactor, 

so I commented it out, cause it seems that hardcoded 384 value won't work for different settings. Thanks.

garygru commented 10 months ago

I took a list for the sake of easiness. When this turns out to be a cpu hog we could change it to a fixed size array of pointers. Insertion and removal of nodes is probably more complicated then.

garygru commented 10 months ago

Parallel chains are cool. Maybe it would be easier to add nodeLists which are only processed when size is > 0. E.g. ChainA, ChainB, ChainC... Also keep in mind once there is latency introduced in one chain, it needs to be compensated in the other chains.

copych commented 10 months ago

Yes, 3 or 4 or even just 2 chains are quite enough for the MPU. About the latency, there should be no problem, but if there's such unintended latencies for FIRs or IIRs or other FXs with internal buffering, we could introduce a sampleDelay parameter into each node, and compensate it in the final mix (small ring buffers for each chain will do the thing)

copych commented 10 months ago

Some information gathered. I am not that fast on coding, so I had some reading prior to experimenting ) They said that std::vector is almost as fast as fixed size array, and std::list has worse performance but much more efficient and convenient insertion/deletion/swapping routines. As dynamic chain forming seems not to be on the highest demand, may we switch to the std::vector ? It shouldn't be hurting, and we still would be able to dynamically form the chains, though with some losses. Another question is can we move delayLine to a separate class (template)? I am now looking thru the daisySP effects, they have a lot of effects based on a delay line template.

garygru commented 10 months ago

Calculations for chain forming are pretty seldom and definitely not per sample. So std::vector will do it. Anyway, I assume that most of the CPU time is used in node calculations. But can be a future improvement. Good idea, let delayLine be a separate class cause several effects can be based on this or use it.