Open marvinschmitt opened 9 months ago
Great proposals! I read the paper. But I am not sure if the method can generate posterior samples, which may be helpful to detect multi-modal distributions.
Yes, the method will not deal with multimodality out of the box. I think its main use case will be as a fast and frugal heuristic to obtain estimates very quickly.
Idea: can we train the summary nets based on point inference nets and then reuse the trained summary nets in combination with full posterior inference nets?
Stefan Radev @.***> schrieb am Di., 13. Feb. 2024, 12:43:
Yes, the method will not deal with multimodality out of the box. I think its main use case will be as a fast and frugal heuristic to obtain estimates very quickly.
— Reply to this email directly, view it on GitHub https://github.com/stefanradev93/BayesFlow/issues/121#issuecomment-1940324907, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2AH5AXV6TW2ZYGIOC4DYTLHMNAVCNFSM6AAAAABC5CBZ3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBQGMZDIOJQG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Multi-modality on the marginals should be detectable through the posterior quantiles. The main "compromise" is that there is no easy way to get information on posterior joint dependence generally. But as Stefan said, this is more of an "exploratory" step or an alternative for when the full NF framework is proving difficult to fit.
I like Paul's idea. Possible issue: Summaries that are good for estimating individual parameters might not be relevant/suitable for extracting information on posterior correlations (additional summaries might be needed when doing full posterior inference).
On Tue, Feb 13, 2024 at 3:00 PM Paul-Christian Bürkner < @.***> wrote:
Idea: can we train the summary nets based on point inference nets and then reuse the trained summary nets in combination with full posterior inference nets?
Stefan Radev @.***> schrieb am Di., 13. Feb. 2024, 12:43:
Yes, the method will not deal with multimodality out of the box. I think its main use case will be as a fast and frugal heuristic to obtain estimates very quickly.
— Reply to this email directly, view it on GitHub < https://github.com/stefanradev93/BayesFlow/issues/121#issuecomment-1940324907>,
or unsubscribe < https://github.com/notifications/unsubscribe-auth/ADCW2AH5AXV6TW2ZYGIOC4DYTLHMNAVCNFSM6AAAAABC5CBZ3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBQGMZDIOJQG4>
. You are receiving this because you are subscribed to this thread.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/stefanradev93/BayesFlow/issues/121#issuecomment-1940387670, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEU3LL2TL35DSD6ZYOV63YTLQMHAVCNFSM6AAAAABC5CBZ3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBQGM4DONRXGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I like the idea too! We could use the point summary network or the point estimates themselves as summaries, with an auxiliary, smaller summary stream for capturing additional information relevant for the joint.
I think using the point estimates as summary statistics will be similar to when one uses expert summary statistics, which we know can work well (e.g., the Jiang et al. Statistica Sinica paper) but probably what you have at the moment is even better. To start off with, from what I know of BayesFlow, I think we should keep the statistics network as is, and add an option for the user to choose "point estimation" instead of "full inference" -- technically all is required is to swap out the invertible NN with a much simpler dense neural network with input dimension equal to the summary statistic dimension and output dimension equal to the parameter dimension. There could even be a sensible default for the architecture of this NN. The user also needs to be able to specify a loss function when choosing "point estimation", but we could just start off with squared error loss at first just to get it running.
On Wed, Feb 14, 2024 at 12:27 AM Stefan Radev @.***> wrote:
I like the idea! We could use the point summary network or the point estimates themselves as summaries, with an auxiliary, smaller summary stream for capturing additional information relevant for the joint.
— Reply to this email directly, view it on GitHub https://github.com/stefanradev93/BayesFlow/issues/121#issuecomment-1941513743, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEU3OVKIAMWU7RDJTACD3YTNS3JAVCNFSM6AAAAABC5CBZ3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBRGUYTGNZUGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>
yes I fully agree. my idea was going in a different direction though but it would be more like research which we can do once the implementation is done
Andrew Zammit Mangion @.***> schrieb am Mi., 14. Feb. 2024, 16:10:
I think using the point estimates as summary statistics will be similar to when one uses expert summary statistics, which we know can work well (e.g., the Jiang et al. Statistica Sinica paper) but probably what you have at the moment is even better. To start off with, from what I know of BayesFlow, I think we should keep the statistics network as is, and add an option for the user to choose "point estimation" instead of "full inference" -- technically all is required is to swap out the invertible NN with a much simpler dense neural network with input dimension equal to the summary statistic dimension and output dimension equal to the parameter dimension. There could even be a sensible default for the architecture of this NN. The user also needs to be able to specify a loss function when choosing "point estimation", but we could just start off with squared error loss at first just to get it running.
On Wed, Feb 14, 2024 at 12:27 AM Stefan Radev @.***> wrote:
I like the idea! We could use the point summary network or the point estimates themselves as summaries, with an auxiliary, smaller summary stream for capturing additional information relevant for the joint.
— Reply to this email directly, view it on GitHub < https://github.com/stefanradev93/BayesFlow/issues/121#issuecomment-1941513743>,
or unsubscribe < https://github.com/notifications/unsubscribe-auth/AABEU3OVKIAMWU7RDJTACD3YTNS3JAVCNFSM6AAAAABC5CBZ3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBRGUYTGNZUGM>
. You are receiving this because you are subscribed to this thread.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/stefanradev93/BayesFlow/issues/121#issuecomment-1943142265, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2AAU4KQIMG6AP4GYYPTYTRIOFAVCNFSM6AAAAABC5CBZ3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBTGE2DEMRWGU . You are receiving this because you commented.Message ID: @.***>
and I did not mean using the point estimates as summary statistics. I agree that would not be very useful in general.
Paul Buerkner @.***> schrieb am Mi., 14. Feb. 2024, 16:51:
yes I fully agree. my idea was going in a different direction though but it would be more like research which we can do once the implementation is done
Andrew Zammit Mangion @.***> schrieb am Mi., 14. Feb. 2024, 16:10:
I think using the point estimates as summary statistics will be similar to when one uses expert summary statistics, which we know can work well (e.g., the Jiang et al. Statistica Sinica paper) but probably what you have at the moment is even better. To start off with, from what I know of BayesFlow, I think we should keep the statistics network as is, and add an option for the user to choose "point estimation" instead of "full inference" -- technically all is required is to swap out the invertible NN with a much simpler dense neural network with input dimension equal to the summary statistic dimension and output dimension equal to the parameter dimension. There could even be a sensible default for the architecture of this NN. The user also needs to be able to specify a loss function when choosing "point estimation", but we could just start off with squared error loss at first just to get it running.
On Wed, Feb 14, 2024 at 12:27 AM Stefan Radev @.***> wrote:
I like the idea! We could use the point summary network or the point estimates themselves as summaries, with an auxiliary, smaller summary stream for capturing additional information relevant for the joint.
— Reply to this email directly, view it on GitHub < https://github.com/stefanradev93/BayesFlow/issues/121#issuecomment-1941513743>,
or unsubscribe < https://github.com/notifications/unsubscribe-auth/AABEU3OVKIAMWU7RDJTACD3YTNS3JAVCNFSM6AAAAABC5CBZ3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBRGUYTGNZUGM>
. You are receiving this because you are subscribed to this thread.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/stefanradev93/BayesFlow/issues/121#issuecomment-1943142265, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2AAU4KQIMG6AP4GYYPTYTRIOFAVCNFSM6AAAAABC5CBZ3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBTGE2DEMRWGU . You are receiving this because you commented.Message ID: @.***>
I also think we need to do it in two steps:
Thus, only inference will potentially be faster, not training. I suspect that attemtps to train summary statistics directly via point estimates might result in poor summaries, because it will be harder for the summary network to capture the uncertainty.
A luxury method to train point estimates would enhance the point estimator network with additional outputs for the (co)variance of the point estimate and the fraction of the full posterior explained by the MAP mode.
These outputs could be trained by destillation from the SBI posterior: (1) Find the MAP mode in the NF posterior output => target value of the MAP point estimate. (2) Fit a Gaussian to the MAP mode (=Laplace approximation) to get the (co)variance and fraction of probability mass explained by the MAP mode.
The destillation approach has the additional benefit that the point estimator would return the true MAP estimates (subject to the accuracy of the normalizing flow) instead of the ground truth parameterizations of the training runs.
Summarizing the ideas so far:
Train point estimators instead of generative networks - simply attaches a configurable MLP to the summary network and uses some norm (L1, L2, L-Infinity, https://www.tensorflow.org/api_docs/python/tf/norm) as a loss.
Train point estimators and use pre-trained summary network in tandem with a full generative network - uses a frozen summary network, possibly in combination with another, smaller and trainable summary network, to train a full workflow.
Train the default approach and distil it using a point estimator - trains the full workflow (i.e., summary and generative net) and attaches a configurable MLP to the pretrained summary network, possibly learning a summary of the approximate posterior (e.g., MAP).
Crucially, all three are enabled, once the functionality is there. Does anyone want to implement the blueprints Marvin suggested in the first post? Otherwise, I would self-assign in a week or two.
Looks good, I think 1. is where we should start as that will help establish the core benefits, and then move from there. Matt and I will be happy to provide comments and check changes/provide suggestions as they come through; can also contribute to code but probably someone who knows the internals should drive it.
On Thu, Feb 15, 2024 at 12:24 AM Stefan Radev @.***> wrote:
Summarizing the ideas so far:
Train point estimators instead of generative networks - simply attaches a configurable MLP to the summary network and uses some norm (L1, L2, L-Infinity, https://www.tensorflow.org/api_docs/python/tf/norm) as a loss.
- Advantage: fast and pleasant to iterate over for larger problems.
- Disadvantage: no full Bayesian inference (by design).
Train point estimators and use pre-trained summary network in tandem with a full generative network - uses a frozen summary network, possibly in combination with another, smaller and trainable summary network, to train a full workflow.
- Advantage: the first summary network is trained fast and already captures some important information. May be equivalent to good hand-crafted summary statistics.
- Disadvantage: summary statistics may not be good, speed gains diminished due to two-step approach.
Train the default approach and distil it using a point estimator
trains the full workflow (i.e., summary and generative net) and attaches a configurable MLP to the pretrained summary network.
Advantage: the summary statistics will already be good for learning the full posterior (assuming everything has converged just fine), so they will also be good for the point estimator. Point estimator can then be trained hyperefficiently.
Disadvantage: speed gains diminished due to two-step approach, the utility of the point estimates is unclear if full workflow is already good.
Crucially, all three are enabled, once the functionality is there. Does anyone want to implement the blueprints Marvin suggested in the first post? Otherwise, I would self-assign in a week or two.
— Reply to this email directly, view it on GitHub https://github.com/stefanradev93/BayesFlow/issues/121#issuecomment-1943760608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEU3MBQXPGPA7ZOIRJAVTYTS3HJAVCNFSM6AAAAABC5CBZ3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBTG43DANRQHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi all, I have implemented and tested a first working version of an amortized point estimator, which can be found in the Development
branch.
An example use case is:
import bayesflow as bf
from bayesflow.benchmarks import Benchmark
# Trivial Gaussian benchmark or a real model
benchmark = Benchmark('gaussian_linear', mode='posterior')
# An easy-to-use MLP with residual connections
inference_net = bf.helper_networks.ConfigurableMLP(input_dim=10, dropout_rate=0.05)
# Can be any norm in [1, 2, np.inf] and can also use a summary network
amortizer = bf.amortizers.AmortizedPointEstimator(inference_net, norm_ord=2)
# Training can happen as usual
trainer = bf.trainers.Trainer(amortizer=amortizer, configurator=benchmark.configurator)
data = benchmark.generative_model(5000)
h = trainer.train_offline(data, epochs=50, batch_size=32)
# Quick point estimates can be obtained by simply calling the .estimate() method
test_data = benchmark.configurator(benchmark.generative_model(100))
estimates = amortizer.estimate(test_data)
Let me know what you think.
Great, thanks! Just adding two pointers to Stefan's post:
AmortizedPointEstimator
in bayesflow.amortizers
: https://github.com/stefanradev93/BayesFlow/blob/65082730bd938f4068aaac62839a3ef160c55460/bayesflow/amortizers.py#L1218The workflow looks good to me. How would you generalize the interface to different loss function for different point estimates? The norm_ord is perhaps a bit restrictive an interface for that(?)
The workflow looks good to me. How would you generalize the interface to different loss function for different point estimates? The norm_ord is perhaps a bit restrictive an interface for that(?)
The current AmortizedPointEstimator
supports a loss_fun
argument which takes precedence over norm_ord
when provided and allows for generic loss functions. We may want to allow some string arguments which default to useful estimators that are not captured by the simple norm_ord
or provide a more semantic interface (e.g., loss_fun="mean"
).
Great job! Does the current workflow support the uncertainty quantification for estimated parameters? Do we need apply bootstrap for UQ just like what the reference paper did?
You would currently need bootstrap or Monte Carlo dropout for UQ, unless a custom loss_fun
already entails UQ.
The interface looks good to me as well. @han-ol and I implemented the illustrative example from the paper (see #145) and the results look sensible. Feel free to take a look and to improve the notebook, maybe it can serve as the foundation of a more detailed tutorial notebook on the method.
The interface looks good to me as well. @han-ol and I implemented the illustrative example from the paper (see #145) and the results look sensible. Feel free to take a look and to improve the notebook, maybe it can serve as the foundation of a more detailed tutorial notebook on the method.
Can I ask the link of this notebook? I can look into and learn it. Thanks!
The interface looks good to me as well. @han-ol and I implemented the illustrative example from the paper (see #145) and the results look sensible. Feel free to take a look and to improve the notebook, maybe it can serve as the foundation of a more detailed tutorial notebook on the method.
Can I ask the link of this notebook? I can look into and learn it.
Thanks!
Hi all, the current interface supports loss functions of the form $\mathcal L=f(\hat \theta - \theta)$. This covers many of the interesting loss functions. However, my understanding is
cannot abide this form.
To support losses that are not simply functions of the difference $\hat \theta - \theta$ I suggest we change the current signature of the custom loss_fun
from loss_fun(net_output - paramters)
to loss_fun(net_output, paramters)
.
The implementation replaces the tf.norm
with a wrapper that takes two arguments, so the convenient functionality of just choosing norm_ord
stays untouched and the illustrative notebook runs just as before. Not sure if we want this wrapper and if it is well placed in losses.py
.
Let me know what you think!
Sorry for the delay -- this is really nice, thanks @stefanradev93 for the implementation and @vpratz @han-ol for the easy-to-follow notebook, very cool!
I think this is already great as is, from a user point of view I agree with @han-ol that it would be helpful to have a generic loss interface and possibly also some "template" losses one could use. One possibility is to provide some template losses (e.g., normed_difference
, quantile_loss
, etc.) that take in net_output
and parameters
and then have the additional arguments specific to that loss (e.g., power of the normed difference, quantile to target, etc.). Users can of course specify their own losses but the templates would cover 95% of use cases.
The last thing which as a user will be really handy is a function like amortizer.bootstrap()
that can do the bootstrap for you. The easiest would be parametric bootstrap where the amortizer.estimate()
is run with the obs data to give you the estimate, the simulator is run at that estimate to give you an N-sized bootstrap sample (assuming the simulator is available and that the training data are not fixed beforehand), and then amortizer.estimate()
is run again on that sample. This would be useful for all loss functions... e.g., one might want to do get a bootstrap sample of 95% credible intervals.
Anyway, really nice stuff, very happy to see it implemented in BayesFlow!
Agreed, great job everyone!
@andrewzm
and possibly also some "template" losses one could use. [...] the templates would cover 95% of use cases.
Based on your expertise and experience with amortized point estimators, would you mind compiling a list of template losses that might cover the vast majority of users' needs?
I would say some "experience" not "expert", but I'm fairly confident that the vast majority of cases will be covered by one of the following losses:
Other useful losses include:
There are other losses one might consider, e.g., Stein's loss for covariance matrices which might be useful (one would need to go through the Cholesky factor to ensure positive definiteness of $\hat\Sigma$ in the NN architecture) and Huber losses, although I don't know without further reading what Bayes estimators these would lead to. For starters I would put in the normed difference, quantile loss and the weighted squared-error loss.
Fan, J., & Yao, Q. (1998). Efficient estimation of conditional variance functions in stochastic regression. Biometrika, 85(3), 645-660. Robert, C. P. (2007). The Bayesian Choice. New York: Springer. Sainsbury-Dale, M., Richards, J., Zammit-Mangion, A., & Huser, R. (2023). Neural Bayes estimators for irregular spatial data using graph neural networks. arXiv preprint arXiv:2310.02600.
Since this issue contains past discussions and involved people are watching it:
How do we go about implementing amortized point estimators in the new BayesFlow 2.0 release?
Naming wise, I would suggest, we will implement a PointApproximator
class in the approximator
module. I don't know how easily portable the existing code is, but perhaps not too difficult? Deferring to @stefanradev93 and @LarsKue to judge this.
While the most common usage of this would be to estimate points, it could also be worthwhile to (A) choose a more general name or (B) split it out in different classes. From a technical point of view, this issue relaxes the definition of Approximator
for all situations where we don't represent a whole distribution with a generative network, but not all of these cases are strictly point estimators.
I couldn't think of a nice general name that is not already taken (Approximator).
So what do you think about having a
Approximator
(could also be called DistributionApproximator
for specificity, or we keep it short as is),PointApproximator
(feed forward net with 1 output node),IntervalApproximator
(feed forward net with 2 output nodes, ), andQuantileApproximator
(feed forward net with any number of output nodes, quantile levels are selected by loss fun or with convenient constructor)?
I would expect that when we eventually get to the diagnostics API, it will help if the different approximator outputs correspond to well specified classes. (I have a implementation sketch of a calibration diagnostics for the QuantileApproximator
for example.)
Also parametric bootstrap* could be a method applicable to PointApproximator
but not to IntervalApproximator
, etc.
(*Point estimate each observed dataset and resimulate a few samples corresponding to its point estimate, then apply the point estimator to get some UQ for the points. This captures stochasticity in the likelihood that can lead to point estimation variance.)
I like the breakdown into the different types of approximators. I think the word "approximator" is also good, as one can often treat point estimates as parameters of a highly constrained approximate posterior distribution. Note that there is nothing stopping one obtaining bootstrap intervals for intervals as well, although these are less used in practice.
On Wed, Oct 2, 2024 at 3:31 AM Hans Olischläger @.***> wrote:
While the most common usage of this would be to estimate points, it could also be worthwhile to (A) choose a more general name or (B) split it out in different classes. From a technical point of view, this issue relaxes the definition of Approximator for all situations where we don't represent a whole distribution with a generative network, but not all of these cases are strictly point estimators.
I couldn't think of a nice general name that is not already taken (Approximator).
So what do you think about having a
- Approximator(could also be called DistributionApproximator for specificity, or we keep it short as is),
- PointApproximator (feed forward net with 1 output node),
- IntervalApproximator (feed forward net with 2 output nodes, ), and
- QuantileApproximator (feed forward net with any number of output nodes, quantile levels are selected by loss fun or with convenient constructor)
?
I would expect that when we eventually get to the diagnostics API, it will help if the different approximator outputs correspond to well specified classes. (I have a implementation sketch of a calibration diagnostics for the QuantileApproximator for example.) Also parametric bootstrap* could be a method applicable to PointApproximator but not to IntervalApproximator, etc.
(*Point estimate each observed dataset and resimulate a few samples corresponding to its point estimate, then apply the point estimator to get some UQ for the points. This captures stochasticity in the likelihood that can lead to point estimation variance.)
— Reply to this email directly, view it on GitHub https://github.com/stefanradev93/BayesFlow/issues/121#issuecomment-2386577190, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEU3JF6LNWR7UPQAUFH6LZZLL7VAVCNFSM6AAAAABPEATCM2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBWGU3TOMJZGA . You are receiving this because you were mentioned.Message ID: @.***>
In the dev version, we call the main approximator ContinuousApproximator
because we approximate a full continuous distribution. For a user interface, it would be better, I think, to have one PointEstimator class to handle all the point estimates, i.e. measures of central tendency such as means or medians, measures of variation such as SD or MAD, and quantiles. Often enough I want to estimate several of them at the same time, say a mean and a bunch of quantiles. If I had different approximators for these different kinds of point estimators I wouldn't be able to learn them efficiently together.
We could, in principle, also have a ApproximatorCollection
class that bundles multiple of the feed forward approximators together and owns shared weights for them. You could pass a list of the point estimates you want, train them together, and the code can internally distinguish the different kinds of point estimates for bootstrap and plotting.
@andrewzm right, interesting point regarding bootstrap on intervals. I guess it still needs custom code to deal with the differing shapes and select which parameters to plug in the simulator, so for that to be handled without the user writing some configuration code for a hook in a parametric_bootstrap function, we would need well specified IntervalApproximator
, etc... classes.
For interval approximation specifically, would you plug both of the interval boundaries in the simulator or sample from the interval? Is there an established way of doing this?
Stepping back: I don't know whether this ApproximatorCollection
thing would be feature creep and we thus should focus on a little less tidy, but highly flexible PointApproximator
class that the user customizes themselves. I just want to raise that, I think, if we have a most general PointApproximator
class the implementation of things like automatic diagnostic plotting and bootstrap gets a little messy and will probably require "expert" knowledge by the user.
@han-ol Assuming we're talking about a parametric bootstrap setting, I think what would be useful to the user is getting a feel for the (frequentist) validity of these quantile intervals around a notional parameter value related to the available data, which could be the posterior mean or median. The bootstrapped intervals could be used to quantify the expected frequentist coverage probability, by seeing the proportion of times the "notional" value falls into the bootstrapped intervals. This is similar to the procedure Hermans et al. describes in "A Trust Crisis In Simulation-Based Inference? Your Posterior Approximations Can Be Unfaithful" (https://arxiv.org/pdf/2110.06581) in Section 2.2 but for a notional value of theta (e.g., the posterior mean) instead of averaging over the whole parameter space. From a software point you're right in that things are bit different, but not much -- with point estimators the notional value will be the estimate itself, while in the interval case it would need to be specified (but it could also be parameter value in the middle of the interval as a default.. crude but probably OK as a default).
The paper "Likelihood-free parameter estimation with neural Bayes estimators" (Sainsbury-Dale, Zammit-Mangion, & Huser, 2023) enables neural amortized point estimation, which is generally faster than fully Bayesian neural inference with conditional generative models (current standard in BayesFlow).
Implementing amortized point estimators in BayesFlow would help with fast prototyping and sanity checks (and surely other tasks, too!).
Some BayesFlow pointers/proposals:
AmortizedPointEstimator
inbayesflow.amortizers
bayesflow.losses
(depending on the target quantity)bayesflow.inference_networks
which takes the summary network output (e.g., DeepSet outputs) and regresses to the point estimatebayesflow.summary_networks
and probably work out-of-the-boxSome links:
NeuralEstimators
package (R): https://github.com/msainsburydale/NeuralEstimatorsNeuralEstimators.jl
package (Julia): https://github.com/msainsburydale/NeuralEstimators.jl