Chowdhury-DSP / chowdsp_utils

JUCE module with utilities for ChowDSP
Other
217 stars 23 forks source link

WDF optimisation related question, not issue #37

Open ghost opened 3 years ago

ghost commented 3 years ago

Thank you very much for the work on this great framework, wonderful ! The benchmark from jatinchowdhury18 repo (wdf-bakeoff) see excellent result. Great job. I'm the guy who have code the Juce's WDF++ header few years ago.

I've just a quick question about the wdf_t.h ? Is the using of macro'ish alias technique SampleTypeHelpers to access to generic parameters (duplicated into subclasses), instead of using base class protected variables? Is this motived by compiler optimisation and related to the fact of the polymorphism fails on performance (cf. benchmark)?

About optimisation, I'm interested to see if constexpr technique (instead inlining that is old-school and not really pertinent with modern compilers) can allow to enhance the result. For instance, C++20 allow virtual on constexpr (ex. the impedance calculation).

In all case, thank you very much for your work !!!

jatinchowdhury18 commented 3 years ago

Thanks for the questions, and for your original work on WDF++!

The reason for SampleTypeHelpers is to enable correct initialization of member variables, particularly for SIMD types. For example, when we initialize the impedance of a WDF element, if the element type T = juce::dsp::SIMDRegister<float>, then we'll get an error there, since the default value 1.0e-9 is a double, and a SIMD float register can't be initialized from a double. So we need to cast it to a float first, which is why we need the SampleTypeHelpers. I hope that makes sense... I'm not very good at explaining C++ template stuff.

The constexpr technique sounds promising, though I'm not sure which functions could potentially be marked constexpr?

Thanks, Jatin

ghost commented 3 years ago

With a first quick approach of the problem, a partial implementation (only elements needed for the DiodeClipper test) of the polymorphic cpp version using constexpr, and a x64 release compilation on Visual Studio 2019 using llvm-clang compiler I get potential good results (100 seconds at 192Khz with the same benchmark code than wdf-bakeoff):

Processed 100 seconds of signal in 0.641661 seconds / 155.846x real-time [template cpp]
Processed 100 seconds of signal in 0.650887 seconds / 153.636x real-time [polymorphic cpp with constexp]
Processed 100 seconds of signal in 1.15679  seconds /  86.446x real-time [polymorphic cpp]

The same test compiled with the Microsoft Visual Studio 2019 compiler (version: 16.10.0)

Processed 100 seconds of signal in 0.748031 seconds / 133.684x real-time [template cpp]
Processed 100 seconds of signal in 0.748378 seconds / 133.622x real-time [polymorphic cpp with constexp]
Processed 100 seconds of signal in 1.27241 seconds  /  78.590x real-time [polymorphic cpp]

So, the first impression is that the compiler manage a better code that using (force) inlining with some compile-time optimizations. And by essence of the simplicity of the code, the difference between the internal VS2019 compiler and the LLVM-Clang extension is less pronouced than in more complex project.

I read your answer right now. I do not have use SIMD optimization, nor in your implementation, nor in mine. So you're right, casting can be a bottleneck but i'm not sure of that, particulary using constexpr that is compile-time optimization.

About the usage of constexpr, it's simple. I have replace all the inline function by constexpr and let the compiler optimize the code where it can be optimized. The only methods that are not clearly candidate for constexpr are the constructors, setters and the parenting methods (aka connectToParent).

Yes, the constexpr technique is promising in particular in such of architecture, you can see clearly that the difference is notable.

I will now attempt to modify your implementation (wdf_t.h) to check if we can gain some benefits and will proceed to more deepfull testing.

Friendly, Max

ghost commented 3 years ago

OK, so the constexpr modification on your implementation (without SIMD) do not provide great improvement. This is due to the fact that your wdf_t.h is already optimized by the fact of the duplication of members inside subclasses. The actual code is not true OO, not pure metaprogramming, not pure functional but a mix in-between (cf. external helpers : current / voltage methods).

Compiled same way with LLVM-clang:

Processed 100 seconds of signal in 0.642441 seconds / 155.656x real-time [template cpp with constexpr]
Processed 100 seconds of signal in 0.642641 seconds / 155.608x real-time [template cpp]
Processed 100 seconds of signal in 0.644031 seconds / 155.272x real-time [polymorphic cpp with constexpr]
Processed 100 seconds of signal in 1.12406  seconds /  88.963x real-time [polymorphic cpp]

So, I will investigate about a new architecture to optimize some redundances and code duplication. Even if it's not heavy weight for memory, the fact that using macro to embbed the members this way it's clearly a open-door to side-effect. Today with latest C++ version we can avoid macros and hidden code (SIMD) by using compile-time inclusion tricks (do not need private 'internal' sub-methods (ex. calcImpedanceInternal) with metaprogramming).

But because most of the changes will need recent compiler, it's better I think to work on a third version of the header.

Oh yes, I just want to say to you. What a good work, love your plugins and the global quality of your work. It's a rare repo that impress me so much !!!

jatinchowdhury18 commented 3 years ago

Oh wow, this is very interesting! First off, I didn't know that constexpr could be used in this way. Also, it's great that the polymorphic implementation is getting approximately the same performance as the templated API!

Yeah, I definitely wasn't super happy with using a macro for the internal variables, but couldn't think of a better way to do it at the time. If you have a better solution I'd be happy to merge that.

Thanks so much for the kind words, and for appreciating this project! Feel free to make a pull request with whatever changes you have.

ghost commented 3 years ago

Your welcome. Yes, constexpr is a more deepfull concept than inlining that produce not only a faster code but potentially a better executable size. Sometimes the constexpr usage is parazited with some obscure metaprogramming concepts, but it's a great feature for modern code. Mixed with some type traits concepts you can do some great optimizations for high-level users.

For example, a little trick just for the idea:

template <typename T>
class Base
{
    [...]

    typedef T BaseT;
};
template <typename T>
class Resistor : public Base<T>
{
    [...]
};

template <typename T>
class Capacitor : public Base<T>
{
    [...]
};
template <typename P1, typename P2, 
          typename T = typename std::common_type<typename P1::BaseT, 
                                                 typename P2::BaseT>::type>
class Parallel
{
    [...]
};

The idea is to use promote-type-like technique to avoid type declaration at user level.

void wiring()
{
    using namespace wdf;

    auto R1 = Resistor  { 300.0 };
    auto C1 = Capacitor { 1.0e-6 };

    auto P1 = Parallel  { R1, C1 };
}

Of course a Resistor { 300 }; will instanciate the resistance with integer value but we can add a compile-time error by checking the template value for floating_point. With the same idea, we can force the P1/P2 to be the same type and all of that without constraint for the performance ... all is compile-time.

I need to investigate the r-tyoe classes too because this is the key for doing more complex circuits. I see that you provide two types of SIMD backend (the Juce and the XTensor). I'm not an expert but as I know the simd vectorization will be part of the standard library. For now, it's in experimental status : https://en.cppreference.com/w/cpp/experimental/simd/simd, but even microsoft already include vectorization (https://devblogs.microsoft.com/cppblog/simd-extension-to-c-openmp-in-visual-studio/) since 2 years. It will be cool to benchmark the differents methods and maybe avoid third party if the standard library offer better performance.

Last year i've do a C++ port of the ACME.jl (Analog Circuit Modeling and Emulation for Julia) that is based on the 2015 paper : "Automatic Decomposition of Non-Linear Equation Systems in Audio Effect Circuit Simulation". The interrest of ACME is that you can build an run a circuit with multiple non-linearities without the constraints of the basic WDF (single non-linearity at root). Even if the project is well-done and offer some great implementation of circuits elements (ex. a Gummel-Poon model of BJT), the project is not really open to become part of a real-time audio framework.

I've got good performance result using the Fastor library by a automatic generation of C++ solver as you can read here : https://github.com/HSU-ANT/ACME.jl/issues/28 ... But I think it's possible to get better performance with the WDF concept using the method introduce by Bernardini & Wener ("Multi-Port NonLinearities in Wave Digital Structures"). Some years ago, I've talk with Maximilian Rest about the RT-WDF project that seem born dead (no activity). The code is heavy and the most important part is not managed in the framework : decompose a simple circuit into classic WDF branchs and R-Type for the non-linearities. The concept is not greatly detailled in the papers but it seem follow the procedure : circuit > graph > SPQR tree > WDF adapted tree. Hum, maybe something like the ACME incident/topology matricies ? I will see that.

Your WDF toolkit is great and powerful, but I can be more usefull with a third-party generator that produce ready-to-use optimized source code with nl-solver code auto-generated. The idea is not to provide a full SPICE-like tool, but just a helper to generate at least some cool little circuits : tonestack, triode amplifier, envelope follower ... some little effect circuits like a treble-booster, a wah, a fuzz. That needs to add some other classic elements : BJT, JFET, OPA ...

For now, I stay focus on my prime idea : provide a third (experimental) header with some improvement of your already good code. Let me see that this week.

jatinchowdhury18 commented 3 years ago

Ah yeah, I'll definitely have to experiment with transitioning some more of my inline functions to constexpr. I definitely like your ideas to "avoid type declaration at the user level", but I've had issues with that approach when the WDF elements need to be member variables in a class, since auto is not allowed there.

Very cool that SIMD is being added to the standard library! I'll have to give that a try soon. That said, the r-type class is still undergoing some changes while I'm trying it out with a few test circuits. I've actually been chatting with a couple more knowledge-able folks on Discord about making an automated process for generating the R-Type scattering matrix from a circuit netlist. From there, I think it should be mostly straightforward to generate WDF code from a circuit schematic! Anyway, I can add you to this Discord server if you're interested?

Thanks, Jatin

ghost commented 3 years ago

Yes you're right (issues with auto), i need to investigate the code deepfully to see how to find the perfect balance between low and high level user. I just raised the idea without checking.

Ofcourse i'm interested by the WDF subject, but i'm not an expert, even in coding. However I think I'm passionate enough about the subject to be of good advice, well I hope.

Yes, generating code is something trivial, and once the method is found to build the decomposed matrices, I already suggest to use Fastor as backend for the nl-solver. The benchmark found here (https://github.com/romeric/Fastor/wiki/Benchmarks) show that is the perfect candidate to outperform all the others linear algebra framework in a static way (compile-time fixed matrices).

Sincerely, Max

ghost commented 3 years ago

I have read the 2016 dissertation of Kurt James Wener, the one cited in reference in your code.

So the keypoint is the SPQR Tree. Problem, there's no standalone fast C++ implementation of this graph decomposition method. The only implementations are inside the sage-math framework (heavy installer, 900Mo, 2 hours fo install the thousands and thousands files on my computer). And another one, in the Open Graph Drawing Framework (OGDF).

SAGE : python for the SPQR Tree and Graph implementation, Graph backend using C++ boost library (graph framework). OGDF: C++ implementation but not really easy to use code, need to be pre-built the full framework.

It's bad to have to connect to third-party library but it seem that the SPQR conversion is not really easy to code.

Netlist -> Graph : Trivial Graph -> SPQR : ??? Maybe a graph-theory expert can do the job. SPQR -> MNA : no reference, but seem to be trivial once the R-types and others are localised.

Maybe the deal will be to code the converter in Python and in this way to be able to use both Netlists input (txt-like file) or PySPICE directly to generate circuits on-the-fly (no schematic editor). I will look if there's some MNA solution in Python, but this is not the hardest part and can be coded for the specific case.

The C++ auto-generation can be made à la Guitarix by using template based cpp/h models.

What is the current state in our Discord? I don't want to interfere with a potential hidden development.

I've just a doubt ? You only use the S matrix (b = S.a) in the r-type implementation where Wener refere to solve Q by MNA. So, the current implementation doing 'only' (but very well written) the scattering of the non-adaptable elements (aka 2.3 Resolving Multiple Linear Nonadaptable Elements), i'm right ? So, the current missing part is the non-linearity solver part, Just after the MNA generated matrices and just before the produced S matrix. Sorry if it's a dumb question, but I want to well understand all the rudiments before doing anything.

Sincerely, Max

jatinchowdhury18 commented 3 years ago

I've just a doubt ? You only use the S matrix (b = S.a) in the r-type implementation where Wener refere to solve Q by MNA. So, the current implementation doing 'only' (but very well written) the scattering of the non-adaptable elements (aka 2.3 Resolving Multiple Linear Nonadaptable Elements), i'm right ? So, the current missing part is the non-linearity solver part, Just after the MNA generated matrices and just before the produced S matrix. Sorry if it's a dumb question, but I want to well understand all the rudiments before doing anything.

Yes! So the current implementation in this library is just performing the scattering matrix operation for linear non-adaptable elements. I have a couple rough examples of the RootRTypeAdaptor working for different circuits, hopefully I can publish that code on GitHub soon. Eventually, I would like to implement an AdaptableRTypeAdaptor class (still thinking of the right name for this) which could be used for the nonlinear case, but haven't gotten around to it yet.

I think the actual generation of the SPQR tree is a little bit beyond the scope of this library. I've had a couple conversations with folks recently who have been working on implementing this type of thing using using languages like Julia, MATLAB, and Mathematica (apparently the best for symbolically solving matrix equations). There's been some experimenting with Python as well, but symbolically solving the matrix equations in Python seems to be unbearably slow. One idea that we had was to try making a little web interface were someone could upload their circuit net-list, and everything else would get generated from there. Anyway, still a lot of work to do on that front. I've contacted the admin of the Discord channel, so hopefully we can add you there soon!

Thanks, Jatin

ghost commented 3 years ago

I've contacted the admin of the Discord channel, so hopefully we can add you there soon!

Cool.

but symbolically solving the matrix equations in Python seems to be unbearably slow

No it's sure, But I say using a python script to do the following steps:

  1. Convert a Netlist (text file) to a Graph structure. Trivial because Netlist is a node-based enumeration.
  2. Convert the Graph to a SPQR tree. We can even use wrappers to other language. For example, in Guitarix, the symbolic trick use mathematica calls from python. There's no problem, and I think the Graph 2 SPQR is not really a time consuming process, just that the Linear-time implementation of the algorithm need a good graph theory expertise to be rewritten. But in our case, a C++ implementation is out of purpose.
  3. The SPQR result is use to compute the MNA matrices by stamp method (the one I know that is very simple).
  4. Once the matices are populated, we can use them dynamically in a C++ solver similar to the one in the RT-WDF project to produce the scattering matrix of the R-type adapter.

Maybe i've misunderstood something, but I think this is the idea. So, the only auto-generated C++ files are those who includes static matrices from the Modified-Nodal Analysis that calls the (need to be written) multi-dimensional Newton-Raphson solvers in real-time process. Or maybe by pre-generated LUT from differents NL elements for a high speed process.

PS : I think the solution for the internal instanciation of adaptors (Parallel / Series) from the previously trick can be done with this method. Just use the variadic template with a perfect forward using reference wrapper and delegate the instanciation to a private specialized constructor:

template <typename T>
class Parallel final : public Base
{
public:
    template <typename ...Args>
    Parallel(Args&&... args)
        : Parallel(std::ref(std::forward<Args>(args))...)
    {}

private:
    Parallel(std::reference_wrapper<Base> port1, 
             std::reference_wrapper<Base> port2)
      : p1(port1)
      , p2(port2)
    {
        // do what you do
    }

public:
    std::reference_wrapper<Base> p1;
    std::reference_wrapper<Base> p2;
};

With this method, the only template needed is the type, example :

template <typename T>
class MyWDFCircuit
{
public:

    [...]

private:
    using namespace wdf::etc;

    Capacitor<T> C1 { T(250e-12) };
    Capacitor<T> C2 { T(20e-9) };

    Resistor<T>  R1 { T(250e3) };
    Resistor<T>  R2 { T(1e6) };

    Series<T>    S1 { C1, R1 };
    Parallel<T>  P1 { S1, R2 };
};

I've tested the idea, that works with virtual methods on the base class and by the nature of the constexpr in the methods the reference_wrapper will not affect the performances. But that need to be tested deepfully to be sure.

Maybe the variadic template method is more powerfull than the tuple expansion in the r-type.h (not sure of this assertion), but fold expressions is for me the most beautiful thing of the modern C++ evolution.

ghost commented 3 years ago

Just a little thought on the matter of the class naming for the WDF project. Maybe it's not your case, but I do like using namespacing to classify the differents classes, hide thoses who are internal and allow special case with same naming of class that works in different context. Let me explain the idea.

Little theorical example of specifics models in a futur context of non-linearity solving:

auto model = wdf::nl::model::triode::Dempwolf<float>();

or:

auto model = wdf::nl::model::bjt::GummelPoon<float, npn>();

or in the same non-linear namespace, the (potentialy multiple) solvers

auto solver = wdf::nl::solver::NewtonRaphson<float>();

Because model & solver had potentialy a base class, the common practice is to hide the base class for end-user (high-level) by using a details (or something like implementation) namespace like:

wdf
    ::nl
        ::solver
             ::details
                 Solver
             NewtonRaphson
        ::model
             ::details
                 Model
             ::bjt
                 EbersMoll
                 GummelPoon
             ::triode
                 CohenHelie
                 Dempwolf

Another method is to put all the base classes in a single details namespace at root namespace be keep the 'folders' clean.

OK, in the context of the Wave Digital Filter theory, we have elements that can be adapted and unadapted.

wdf 
    ::adapted
        Resistor
        Capacitor
        Inductor
        [...]

    ::unadapted
        Resistor
        Capacitor
        Inductor
        [...]

The navigation under the framework is cleanest using this kind of organization I think. You can extend the concept by split the headers in a same way and by using a single wdf.hpp header that includes the others.

wdf.hpp
wdf_adapted.hpp
wdf_unadapted.hpp
wdf_models.hpp
wdf_solvers.hpp
[...]

Maybe that answer your quest of the good name.

Global rules :

  1. Keep the name of the class shortest as possible.
  2. Use lower-case for the namespace, Uppercase of first letter for the class name
  3. Hide the classes that are not intent to be use by the final user
  4. Split the header following the same rule at the first or second sublevel of the namespacing

This is just a suggestion, but I think it's the good pratice to keep the framework clean to use and develop.

droosenb commented 3 years ago

Hi MaxC2 In case Jatin hasn't already sent it to you, here's the link to join the discord. I'm the current admin there. https://discord.gg/SUWvx4Nd Excited to continue this discussion there! Best, Dirk.

jatinchowdhury18 commented 3 years ago

@MaxC2, very interesting, I'm not really familiar with std::reference_wrapper. Just to make sure I'm understanding, would this be used for the "templated" WDF implementation (WDF_T.h), or for the polymorphic version (WDF.h).

I like your class naming and organization ideas too, definitely a lot more readable and organized than what I currently have. I'll take some time this weekend to re-work the code organization. I'll have to think about whether I want to keep the templated implementations in the same header files as the polymorphic ones....

Thanks, Jatin

ghost commented 3 years ago

@MaxC2, very interesting, I'm not really familiar with std::reference_wrapper. Just to make sure I'm understanding, would this be used for the "templated" WDF implementation (WDF_T.h), or for the polymorphic version (WDF.h).

Hum. Your current implementation is already fast, and changes can affect positively or negatively the overall performance. That's why I think it's better to work to a third implementation in a development branch, keeping the current version in the same state and process to a benchmark when the third implementation will be written. This is not a big task, most of the code keep in place, this is more a like a structural evolution than optimization. It's a point of view. Something it's better to restart from a clean paper in particular when the project is something relatively not complex. But it's your code, your repo, and you decide ;)

Today i've write a netlist.hpp, a single header that parse spice-like netlist using multiple reference (documentation). I have build the OGDF C++ framework and if I don't have problem, tomorrow I will test the SPQR tree conversion. I really want to quickly explore the global procedure and test all the steps before writting anything that can be potentially useless.

The C++ netlist parser can easily be rewritte in any scripting language needed for the final use. I use generic and evolutive technique to do the job, example mapper to the elements parser :

std::map<std::string, std::function<void(std::string, Format&)>> components =
{
    { 
        "R", [=](std::string line, Format& format)
        {
            auto&& [label, n1, n2, value] = tokenize_linear_component(line);
            format.components.push_back(std::make_unique<Resistor>(label, n1, n2, value));
        }
    },
    { 
        "C", [=](std::string line, Format& format)
        {
            auto&& [label, n1, n2, value] = tokenize_linear_component(line);
            format.components.push_back(std::make_unique<Capacitor>(label, n1, n2, value));
        }
    },
    [...]

The Netlist is cleaned by a preprocessor to extract blank lines, comments (*), contatenate the multiline definition (+) and extract the subcircuits (.subckt). That allow me to build the Netlist Format by a very simple loop like:

static Format parse(stringarray& lines)
{
    Format format; // netlist file format structure

    // The title is a special type of comment and 
    // it is always the first line in the file.
    format.title = lines.front();

    auto [newlines, subcircuits] = preprocessor(lines);

    if (lower(command_name(newlines.back())) != "end")
        throw std::exception("netlist parse: last line need to be a 'end' circuit statement");

    for (size_t i = 0; i < newlines.size(); ++i)
    {
        auto line = newlines[i];
        auto first = upper(line.substr(0, 1));

        // is this a component definition?
        if (components.contains(first))
            components.at(first)(line, format);

        // is this a command definition?
        if (first == ".") 
        {
        [...]

So, my idea is to follow the Wener 2016 dissertation and attempt to get same results on same examples (ex. tone stack). Once the procedure will be correct and validated, we can easily thinking about the type of auto-generator we want and the language to use.

Ah yes, one of the great feature to implement on the WDF framework is the magnitude response of the circuit. Wener explain the procedure to do that. This kind of feature is very useful to check the validity of a circuit and produce comparison with the SPICE simulation (or other circuit simulator).

jatinchowdhury18 commented 3 years ago

Hum. Your current implementation is already fast, and changes can affect positively or negatively the overall performance. That's why I think it's better to work to a third implementation in a development branch, keeping the current version in the same state and process to a benchmark when the third implementation will be written. This is not a big task, most of the code keep in place, this is more a like a structural evolution than optimization. It's a point of view. Something it's better to restart from a clean paper in particular when the project is something relatively not complex. But it's your code, your repo, and you decide ;)

I guess I was more curious about how std::reference_wrapper works. Like does it need to infer the type as if from a template, or is it more like a pointer in that in relies on some sort of "virtual" element table. I definitely agree that it's good to measure the performance, but I also usually like to look at the generated assembly code. For example, this is what's currently generated for the LPF in the wdf-bakeoff. If replacing the adaptors with the std::reference_wrapper approach will generate the same assembly, then that would be a great sign!

The netlist parsing definitely looks promising! I have some code for the Bassman tone stack lying around somewhere (with the SPQR tree derived by hand), so I'd be curious to compare that with whatever is generated from the netlist parser...

Ah yes, one of the great feature to implement on the WDF framework is the magnitude response of the circuit. Wener explain the procedure to do that. This kind of feature is very useful to check the validity of a circuit and produce comparison with the SPICE simulation (or other circuit simulator).

Could you point me to where in Kurt's paper he discusses this? I agree, that would be incredibly useful!

Thanks, Jatin

ghost commented 3 years ago

Oh, just see your answer (github do not show me the notifications anymore, I need to see that)

I guess I was more curious about how std::reference_wrapper works.

A synopsis of implementation is available here , so the reference wrapper class seem to hold a pointer. I hope the constexpr compatible definition allow the compiler to produce code without overhead. I think that is something related to memory management : on the stack (reference), on the heap (pointer) where the stack is somewhere more faster (but also more limited ... remember, i'm sure you've already experiment a stack overflow with some kind of buggy recursive code :) like me ).

A benefit of std::reference_wrapper is that allow to hold reference in a std::vector, that can be very usefull. Sometimes it's delicate to manage OO virtual classes with references (don't call base class, and so on), here it seem to perfectly do the job because this is not a pure reference but a kind of reference. That's what I understand on the subject.

Managing the assembly code is perfect, but my assembly knowledge is limited to 8-bits Z80 processor :)

Could you point me to where in Kurt's paper he discusses this? I agree, that would be incredibly useful!

Hum, need to check that. But if I remember the idea, this is not really a pure internal WDF method but a technic to inject a signal into the structure, so something that can be a pure functional method in a utility namespace for example.

In : Wave Digital Filter Adaptors for Arbitrary Topologies and Multiport Linear Elements (DAFX-15) PDF

Citation:

To verify this model, we compare a family of magnitude response curves to “ground truth” SPICE simulations (Fig. 5). We find the WDF magnitude responses by taking the FFT of an impulse response that has sufficiently decayed to zero. The WDF magnitude response shows a close correspondence to SPICE, except for the expected frequency warping at high frequencies, a known property of the bilinear transform (BLT)

The full procedure is detailled in a Werner paper but need to find the correct reference. Let me see that.

Sincerely, Max

jatinchowdhury18 commented 3 years ago

Hi Max,

Thanks for sharing the reference on std::reference_wrapper, you're right that it seems very useful! Unfortunately, in this case I'd rather not use reference_wrapper in wdf_t.h, simply because it would only make sense to do so if I make incident() and reflected() virtual functions, which in turn does not allow the compier to inline those functions. Since the wave functions are called recursively every sample, being able to inline them is a pretty big deal for the overall performance. As it stands now the wave functions are standard member functions, which can be found by the compiler through the template arguments.

Now for the polymorphic API (in wdf.h), I suppose it would make sense to use the reference_wrapper, however, I had chosen to instead just forward as much of the code as possible to the corresponding templated classes, just to avoid code repetition. Anyway, I'm sure I'll find a use for it at some point, so thanks for sharing!

Thanks too for tracking down Kurt's method for findingthe frequency response. This approach definitely makes sense, although I was kind of hoping to find something that could find the magnitude/phase response at a given frequency just by inspection, rather than by analyzing a signal... my guess is that such a thing does not exist (yet), otherwise, Kurt would be the one to know :). Even still, using the method as described is definitely a useful utility to have along with this library.

Thanks, Jatin