Multidimensional Output Support

platawiec commented 4 years ago

Hi, enjoying the package very much.

It's clear that the initial algorithms are focusing on single-output surrogate models i.e. f: R^n -> R. There is a broad class of problems which would benefit from multi-output surrogate models i.e. f: R^n -> R^k.

It seems like you've given this some thought but I was wondering what your plan was regarding these, and if you needed support in any areas. On the low-hanging fruit side, it could be as simple as allowing a function to be passed that returns a vector, and then re-running the surrogate individually (this seems to be how multi-output RBF is done in the literature). On the more advanced side, Kriging/Bayesian methods or Neural Nets can exploit any structure between the outputs.

My reading of the literature suggests:

Surrogate fitting is rerun for each output
- [x] RadialBasis
- [x] LobacheskySurrogate
User provides surrogate, Surrogates.jl interface just needs to change (should "just work")
- [x] Neural
Surrogate fitting occurs across all outputs
- [x] LinearSurrogate (I think the result is the same as rerunning for each)
- [x] Kriging
- [x] SVMSurrogate https://www.sciencedirect.com/science/article/abs/pii/S0167865513000196
- [x] RandomForestSurrogate https://www.semanticscholar.org/paper/Multivariate-random-forests-Segal-Xiao/5e730be365acd5879b271bc4a9bd6db815e1baf4

In the future, the question of how to do the surrogate optimization will need to be tackled, but I don't see it as a major blocker at the moment. Thoughts?

ludoro commented 4 years ago

Hi @platawiec, I am very glad you like it! Actually, you are not the first person that has asked for multi-output surrogates. I can't say that it is the focus at the moment, but it definitely is a direction worth exploring. Take into account that:

SVM and RandomForest are taken from other Julia libraries that call C code, so there is not much to do.
Multi output Kriging is a multi output Gaussian process as far as I understand, and I am not sure that even a library such as Stheno.jl has it at the moment.

This leaves us with just RBF's, Neural networks and Linear Surrogate: it should not be too hard given that we would just need to re run the surrogates individually as you pointed out.

As for the optimization methods, I have no idea how they can work on multi output surrogates though. They are a major part of the library so that's why I haven't looked into it as much at the moment.

TLDR: If you have time it would be a great addition tackling the low hanging fruits as of now!

platawiec commented 4 years ago

Yes, I agree. I think a good strategy would be to dispatch on the type of y where possible, and throw errors if not implemented.

There is literature on getting optimization methods to work with multi-output surrogates, I'll have a look. I only recall seeing it for Gaussian Processes, however. (A higher priority for me is parallel query optimization, anyways)

Would you consider using Stheno or something as the backend for Gaussian Processes? I may try to implement something there instead.

ludoro commented 4 years ago

Yes, we could use stheno!

_{Sent with GitHawk}

SciML / Surrogates.jl

Multidimensional Output Support #98