Closed mathematicalmichael closed 4 years ago
more thoughts:
input_sample_set = bsam.random_sample_set('r', input_dim, num_samples)
sampler = bsam.sampler(Q_map, num_samples)
my_disc = sampler.compute_QoI_and_create_discretization(input_sample_set)
at this point, all that's missing is the specification of an output_probability_set
simpleFunP.name_of_dist(my_disc) # should have the same effect as
my_disc.set_observed() # with default settings
if simpleFunP
is used, make sure to correctly set the observed distribution.
Now, we are ready to solve:
calcP.prob(my_disc)
want to be able to replace the first line with
input_sample_set = samp.sample_set(dim)
input_sample_set.set_initial() # can specify any initial/prior distribution you want.
input_sample_set.generate_values() # where if values are blank, sampler calls it and fills them in.
this can allow for the skipping of setting the sample set with basicSampling
and let the sampler object do that work intelligently.
here is one possible workflow:
# generate normal initial and infer the dimension and domain from the keyword args.
input_sample_set = bsam.random_sample_set('n', loc=0, scale=1)
sampler = bsam.sampler(Q_map, num_samples)
my_disc = sampler.compute_QoI_and_create_discretization(input_sample_set)
my_disc.set_observed() # optional?
my_disc.set_predicted() # compute pushforward, should be optional, can pass analytical expression. if blank, default to gaussian kde
and now you should be able to call my_disc.updated_pdf
to evaluate the updated density at a new sample.
to evaluate the density at the existing sample points, you can call them method my_disc.set_probabilities()
which will multiply the density evaluations and volumes to set input_sample_set._probabilities
.
The thinking is that you should see a consistency in the solutions between the two methods. If you set the problem up with the sample-based approach, calling calculateP.den
should be calling my_disc.set_probabilities()
and giving you something to compare against using calculateP.prob
Thinking about something similar, for the data-driven approach, where the observed can be taken as the default N(0,1)... which would allow for something like:
We would need to build in something like my_disc._noise_model which is also a distribution...
to compute_QoI...
can pass integer to imply input_dim
and uniform random hypercube, the same way the other methods in this sub-module assume if an integer is passed instead of a sample set.
sampler = bsam.sampler(Q_map, num_samples)
my_disc = sampler.compute_QoI_and_create_discretization(input_sample_set)
my_disc.set_noise_model(dist.norm, scale=0.1) # should be smart enough to infer output dim. if it gets a number, use it to set `data_std` for the data-driven map approach.
# my_disc.noisy_data(std=1) # does your model already perturb output_values or do we need to? if called, pass the `data_std` value. the idea being that you can avoid setting the noise model.
# if calling data driven, by default we will be perturbing the data values. These noise realizations will be stored inside of the output_probability_set, along with the noise distribution. If we already have noisy data values, set the distribution but leave the values blank.
(if no noise model, do not perturb data, assume unit variance if using data-driven. if noise model provided, assume Q_ref is "pure" and perturb it with noise.) set_noise_model will set an attribute data_std
(i.e. grab .std() from provided distribution, if missing, try root of .var(), else print warning about not being able to infer and set it to 1.).
set_noise_model
should actually just set the output probability set, since the interpretation makes sense. You can use it to set the observed if you aren't in a data-driven approach. The values for this sample set will be the RESIDUALS (so the shape still matches the output dim). When "evaluating" the data-driven map, perform sum and division. yes, little less efficient, but it's a quick operation, no worse than what is required in the voronoi-cell approach in calculateP.prob
. This should allow for toggling of modes, so you can compare SSE/MSE/SWE QoI maps and their solutions.
now that the std is known, we can generate our dataset of residuals. output_vals - data, where data is Q_ref + noise.rvs(output_dim).
Here's what I'm coming around to: sensitivity to the data:
my_disc_new.set_initial()
my_disc_new.set_observed(scale=0.1) # first time through, it will generate synthetic data.
my_disc_new.set_data_from_reference() # draws from existing observed to create new synthetic data
my_disc_new.set_observed(scale=0.2) # imposes this as the new observed distribution center
plt.scatter(I._values[:,0], I._values[:,1], c=my_disc_new.updated_pdf(I._values))
plt.show()
all the updates are performed by the re-setting of the observed. the pushforward is only computed once, on first start-up.
this is for non-data driven mode.
repeated observations on the same QoI as data come in... is now extremely memory efficient. never stores vectors it doesn't need.
my_disc_new.data_driven(inds=[0,0,0,0], std=0.4, data=np.array([0.8,1.2,1.3, 1.4]))
my_disc_new.get_setup()
{0: {'col': True,
'inds': [0, 0, 0, 0],
'qoi': 'SWE',
'std': 0.4,
'obs': None,
'pre': <scipy.stats.kde.gaussian_kde at 0x7fe6e4925048>,
'model': <function __main__.one_map(data, operator=array([1, 1]))>}}
my_disc_new.get_data()
array([0.8, 1.2, 1.3, 1.4])
omit data to simulate bootstrapping! change error assumptions as you wish. updated density will immediately reflect changes!
can pass vector of std's if the data becoming available is of different quality:
my_disc_new.data_driven(inds=[0,0,0], std=[0.1, 0.2, 0.3])
my_disc_new.get_setup()
{0: {'col': True,
'inds': [0, 0, 0],
'qoi': 'SWE',
'std': array([0.1, 0.2, 0.3]),
'obs': None,
'pre': <scipy.stats.kde.gaussian_kde at 0x7fe6e40cc240>,
'model': <function __main__.one_map(data, operator=array([1, 1]))>}}
my_disc_new.get_data()
array([0.8, 0.8, 0.8])
had we previously cleared the existing reference data, it would attempt to use a reference value and hit it with noise to simulate data. So, you can pass data=None (default) to do this, or simply
my_disc_new._output_probability_set._reference_value = None
my_disc_new.data_driven(inds=[0,0,0])
my_disc_new.get_data()
array([1.87604622, 2.00842052, 1.84795764])
more thoughts:
input_sample_set = bsam.random_sample_set('r', input_dim, num_samples)
sampler = bsam.sampler(Q_map, num_samples)
my_disc = sampler.compute_QoI_and_create_discretization(input_sample_set)
at this point, all that's missing is the specification of an output_probability_set
simpleFunP.name_of_dist(my_disc) # should have the same effect as
my_disc.set_observed() # with default settings
if simpleFunP
is used, make sure to correctly set the observed distribution.
Now, we are ready to solve:
calcP.prob(my_disc)
want to be able to replace the first line with
input_sample_set = samp.sample_set(dim)
input_sample_set.set_initial() # can specify any initial/prior distribution you want.
input_sample_set.generate_values() # where if values are blank, sampler calls it and fills them in.
this can allow for the skipping of setting the sample set with basicSampling
and let the sampler object do that work intelligently.
here is one possible workflow:
# generate normal initial and infer the dimension and domain from the keyword args.
input_sample_set = bsam.random_sample_set('n', loc=0, scale=1)
sampler = bsam.sampler(Q_map, num_samples)
my_disc = sampler.compute_QoI_and_create_discretization(input_sample_set)
my_disc.set_observed() # optional?
my_disc.set_predicted() # compute pushforward, should be optional, can pass analytical expression. if blank, default to gaussian kde
and now you should be able to call my_disc.updated_pdf
to evaluate the updated density at a new sample.
to evaluate the density at the existing sample points, you can call them method my_disc.set_probabilities()
which will multiply the density evaluations and volumes to set input_sample_set._probabilities
.
The thinking is that you should see a consistency in the solutions between the two methods. If you set the problem up with the sample-based approach, calling calculateP.den
should be calling my_disc.set_probabilities()
and giving you something to compare against using calculateP.prob
Thinking about something similar, for the data-driven approach, where the observed can be taken as the default N(0,1)... which would allow for something like:
We would need to build in something like my_disc._noise_model which is also a distribution...
to compute_QoI...
can pass integer to imply input_dim
and uniform random hypercube, the same way the other methods in this sub-module assume if an integer is passed instead of a sample set.
sampler = bsam.sampler(Q_map, num_samples)
my_disc = sampler.compute_QoI_and_create_discretization(input_sample_set)
my_disc.set_noise_model(dist.norm, scale=0.1) # should be smart enough to infer output dim. if it gets a number, use it to set `data_std` for the data-driven map approach.
# my_disc.noisy_data(std=1) # does your model already perturb output_values or do we need to? if called, pass the `data_std` value. the idea being that you can avoid setting the noise model.
# if calling data driven, by default we will be perturbing the data values. These noise realizations will be stored inside of the output_probability_set, along with the noise distribution. If we already have noisy data values, set the distribution but leave the values blank.
(if no noise model, do not perturb data, assume unit variance if using data-driven. if noise model provided, assume Q_ref is "pure" and perturb it with noise.) set_noise_model will set an attribute data_std
(i.e. grab .std() from provided distribution, if missing, try root of .var(), else print warning about not being able to infer and set it to 1.).
set_noise_model
should actually just set the output probability set, since the interpretation makes sense. You can use it to set the observed if you aren't in a data-driven approach. The values for this sample set will be the RESIDUALS (so the shape still matches the output dim). When "evaluating" the data-driven map, perform sum and division. yes, little less efficient, but it's a quick operation, no worse than what is required in the voronoi-cell approach in calculateP.prob
. This should allow for toggling of modes, so you can compare SSE/MSE/SWE QoI maps and their solutions.
now that the std is known, we can generate our dataset of residuals. output_vals - data, where data is Q_ref + noise.rvs(output_dim).
NOTE: this is all just here for posterity. Actively re-factoring code now. I decided to chop out a lot of "clever" functionality in favor of clearer functions. I really want something slick for being able to switch between the modes, but it may not be worth chasing.
if someone sets an observed, I'm trying to decide whether or not to use the median/mean as the "data" if someone switches modes, and inferring std from the already defined observed.
welp, none of this is relevant anymore. workflow's been refactored.
the following should create an identical set up to using
bsam.random_sample_set('r', input_dim, num_samples)
to make the input set. It checks for the presence of ascipy.distributions.rv_frozen/continuous
object as part of the sample set (initial_distribution
), and if there, use it to generate samples according to that distribution, and keep volumes undefined.my_disc = sampler.compute_QoI_and_create_discretization(input_sample_set)
The sampler class currently does no inference on
Q_ref
based oninput_sample_set._reference_parameter
, which feels like a wasted opportunity to set default behaviors forsimpleFunP
, which can readQ_ref
from the passed data instead of as an argument. For compatability with the sampling approach (and particularly data-driven), I wantsimpleFunP.normal_partition_normal_distribution
and theuniform_partition
variety to be used for both approaches. The goal being for an easy toggle between sampling and voronoi approach.