Clarifications - Githubissues

koriavinash1 commented 4 years ago

Hi @JonathanCrabbe,

Thanks for this wonderful work. I read your paper and supplementary material, have a couple of doubts, your help would be appreciated to understand these concepts :(please correct me if I'm missing anything)

Meijer G-functions is applied to the contour of integration, which is a task-dependent factor, how exactly such hyperparameters are estimated (how to select the subset for p, q, m, n)?
- For example, if I want to interpret CNN's how do I define this contour?
All the selected tasks in the paper seem to be regression (univariable output) problems, Can you provide some pointers to extend this work on multi-output G-functions? (can you suggest some literature for the same)
I tried to reproduce some of the experiments, could run Synthetic data exp. easily (and could easily observe the improvements of this method w.r.t LIME), but for the 'wine_quality' and 'Boston housing' datasets it was taking quite a long time to complete, am I missing some installation steps or the method usually takes that long?
The method proposed can definitely be used for interpreting models, but it can easily help in estimating the data-generation process right? (why is it restricted to model interpretability in the paper?)
Is there any intuition on real and imaginary values of 'Z'? (like, what real curve and imaginary curve correspond to?)
What is the reason for optimizing over residuals, when you have true predictions (on the trained network)?

Thank you, Avinash

JonathanCrabbe commented 4 years ago

Hi @koriavinash1, Many thanks for those remarks and the interesting questions!

I believe that the selection of the contour of integration is more of a theoretical issue rather than a practical one. In my implementation, Meijer-G functions are evaluated by using the mpmath implementation. More precisely, we set the series argument to 1, since we require our G-function to be well defined on (0,1). As you can see, the numerical evaluation of Meijer G-functions relies on an approximation by hypergeometric series rather than a numerical integration in the complex plane. For our purpose, a Meijer G-function is therefore entirely characterized by its four hyperparameters (the strategy to simplify their optimization is described in Section 3 of the paper) and its real parameters (which are trained with a gradient descent).
Extending the formalism to multivariate regression problem is an interesting research question. I am not aware of an extension of Meijer G-functions that include most familiar multivariate functions. Therefore, the most straightforward way to extend our work to multivariate regression seems to learn a Meijer G-function for each output component at each iteration of the Symbolic Pursuit. I don't doubt that there is a more clever way to do it tough :)
It is true that symbolic models are slow to train. The explanation for this is that Meijer G-function are particularly slow to evaluate numerically. My opinion is that our method (as well as Symbolic Metamodels) might benefit significantly from a more efficient numerical implementation of Meijer G-functions. This is a very interesting problem by itself although quite far from my domain of expertise.
This is a very good remark! I don't see any obstruction for the Projection Pursuit algorithm to be used to produce an estimator directly. In our paper, we have restricted the discussion to interpretability because the advantage of using Meijer G-functions is obvious in this context (more transparency).
For the purpose of our paper, all input features and the output are real. In the definition of Meijer G-functions, the integration contour in the complex plane defines the G-function itself. If you are referring to the integration variable, I am not aware of a natural intuition for this. I simply see it as an auxiliary parameter that allows to define a function defined by an integral (just like the integration parameter that appears in the definition of the error function, for instance).
The sequential optimization strategy is motivated by our aim to produce parsimonious mathematical expressions. By using a Projection Pursuit strategy, we increase gradually the size of the mathematical expression of our model until the desired precision is achieved (rather than optimizing over a fixed size).

Hope this helps :)

All the best, Jonathan

koriavinash1 commented 4 years ago

Hi @JonathanCrabbe,

Thanks for your comments, they clarified most of my doubts.

I'm currently trying a multivariable problem for classification with modified loss formulation, will update if I get some good results
I guess, one way to improve training speed would be to incorporate batch support, which I'm planning to explore but at the later stage

Anyway thanks for your comments, I'll raise any other questions as and when I get them :)

Thank you Avinash

koriavinash1 commented 3 years ago

Hi @JonathanCrabbe,

I got a few other questions, it would be great if you could elaborate on them:

What is the notion for associating the obtained symbolic expression to the black-box model? (since the interpreter is just trained with low confidence labels obtained from the black-box model, I'm unable to understand how obtained symbolic expression can model black-box rather than just modeling transformed data generating process). In many interpretability methods tend to analyze model sensitivity by conducting ablation or by any gradient-based approaches to associate the explanations to the model rather than data. How can such a study be done in this case?
Can you please elaborate on the selected working range (0,1) for a Meijer-G function? I don't see any harm in using $(-1, 0) \cup (0, 1)$, I was wondering if there is a reason to apply the positivity condition.

Thank you, Avinash

JonathanCrabbe / Symbolic-Pursuit

Clarifications #1