Question about adapting Laplace for specific application

I just came across your neurips paper this morning and am very intrigued. My research group is working on a scientific machine learning project in a regression context. I think that Laplace might be very useful in our application, but I have a few questions about how adaptable / suitable it would be for our particular problem.

Our overall architecture looks something like this:

$$ y = g1(f1(x)) + g2(f2(x)) $$

Here f1 and f2 are standard feed-forward networks. g1 and g2 are deterministic functions that transform the neural net outputs to enforce physical invariants. This structure allows us to ensure that our network always produces physically realistic predictions, even when facing sparse training data.

We further have functional priors on the outputs of f1 and f2. I.e. we know the range of values output and the length scale of the wigglynes of these functions. We currently model this by augmenting x with random points and using a gaussian process prior over the predicted outputs.

Finally, we also know the level of noise in the observed training data, as these typically arise from repeated experimental measurements, so we have a standard error.

Q1. Is it possible to use a different likelihood?

We would like to use a standard Gaussian likelihood, but where the standard deviation is per-datapoint, rather than a single parameter for all observations. Is this possible?

Q2. Is it possible to incorporate other priors?

As I understand it, Laplace currently supports a variety of approaches to placing Gaussian priors on the weights. Is it possible to also incorporate a functional prior? This is very straightforward when training a MAP model, but it's not clear to me how to incorporate additional regularization terms when using Laplace.

Q3. What would be the most reasonable way to handle a network structure like this in Laplace?

My initial thoughts would be to use sub-network sampling over the last layer of both f1 and f2. Does that seem reasonable? The problems we are intersted in are typically low-dimensional, so treating the last layers in full should not pose a huge computation or memory burden.

Thanks for your help.

aleximmer / Laplace