UCL-CCS / EasyVVUQ

Python 3 framework to facilitate verification, validation and uncertainty quantification (VVUQ) for a wide variety of simulations.
https://easyvvuq.readthedocs.io/
GNU Lesser General Public License v3.0
87 stars 27 forks source link

an issue using a Quasi-Monte Carlo sampler #368

Open kbronik2017 opened 2 years ago

kbronik2017 commented 2 years ago

Hi

An issue using a Quasi-Monte Carlo sampler with discrete uniform (values from ChaosPy distributions)

For example, consider the following scenario:

params = { "X": {"type": "integer", "min": 0, "max": 20, "default": 10}, "Y": {"type": "integer", "min": 0, "max": 30, "default": 20}, "out_file": {"type": "string", "default": "output.csv"} }

vary = { "X": cp.DiscreteUniform(1, 15), "Y": cp.DiscreteUniform(5, 25) }

will give negative samples (needless to say it works very well with only Uniform distribution), and the reason could be found in the file easyvvuq/sampling/qmc.py where

dist_U = [] for i in range(self.n_params): dist_U.append(cp.Uniform()) dist_U = cp.J(*dist_U)

only a Uniform distribution case is implemented, so either you need to add a DiscreteUniform case as well or add some assertion there!

thanks.

wedeling commented 2 years ago

Hi @kbronik2017

Where do you get negative values? in sampler._samples?

kbronik2017 commented 2 years ago

Yes it is, if you add for example "print" statement after the following line:

self._samples = self.distribution.inv(dist_U.fwd(nodes.transpose()))

print("self._samples")

it will print the negative samples

kbronik2017 commented 2 years ago

print("samples:", self._samples)

wedeling commented 2 years ago

If I execute this:

import chaospy as cp
import easyvvuq as uq

params = {
"X": {"type": "integer", "min": 0, "max": 20, "default": 10},
"Y": {"type": "integer", "min": 0, "max": 30, "default": 20},
"out_file": {"type": "string", "default": "output.csv"}
}

vary = {
"X": cp.DiscreteUniform(1, 15),
"Y": cp.DiscreteUniform(5, 25)
}

sampler = uq.sampling.QMCSampler(vary, 10)

print(sampler._samples)

I obtain:

[[ 3.79589844  8.27832031  3.79589844  8.27832031 11.29589844  0.77832031
  11.29589844  0.77832031 15.04589844 12.02832031 15.04589844 12.02832031
   7.54589844  4.52832031  7.54589844  4.52832031  5.67089844  2.65332031
   5.67089844  2.65332031 13.17089844 10.15332031 13.17089844 10.15332031
   9.42089844  6.40332031  9.42089844  6.40332031  1.92089844 13.90332031
   1.92089844 13.90332031  1.45214844  1.24707031  1.45214844  1.24707031
   8.95214844  8.74707031  8.95214844  8.74707031]
 [ 6.53027344  6.53027344 18.71191406 18.71191406 17.03027344 17.03027344
   8.21191406  8.21191406 11.78027344 11.78027344 23.96191406 23.96191406
  22.28027344 22.28027344 13.46191406 13.46191406 14.40527344 14.40527344
  10.83691406 10.83691406 24.90527344 24.90527344 21.33691406 21.33691406
   9.15527344  9.15527344  5.58691406  5.58691406 19.65527344 19.65527344
  16.08691406 16.08691406  8.49902344  8.49902344 14.11816406 14.11816406
  18.99902344 18.99902344 24.61816406 24.61816406]]

So no negative values. What is your version of ChaosPy?

However, the other obvious problem is that the resulting samples are not discrete. If I check the chaospy documentation, it seems that the DiscreteUniform variables are still treated as continuous under the hood:

distribution = chaospy.DiscreteUniform(2, 4)
qloc = numpy.linspace(0, 1, 9)
distribution.inv(qloc).round(2)
array([1.5 , 1.88, 2.25, 2.62, 3.  , 3.38, 3.75, 4.12, 4.5 ])

So the inverse transform (which is also used in the QMCSampler) returns non-integer values. The other problem with the QMCSampler is that it uses an external subroutine from SALib to generate the (Saltelli) sampling plan. This external subroutine only asks for the bounds, not the type of input, e.g. discrete or continuous.

We also have a Monte Carlo sampler (MCSampler), which has a built-in Saltelli sampler. If I run:

import chaospy as cp
import easyvvuq as uq

params = {
"X": {"type": "integer", "min": 0, "max": 20, "default": 10},
"Y": {"type": "integer", "min": 0, "max": 30, "default": 20},
"out_file": {"type": "string", "default": "output.csv"}
}

vary = {
"X": cp.DiscreteUniform(1, 15),
"Y": cp.DiscreteUniform(5, 25)
}

sampler = uq.sampling.MCSampler(vary, 10)

print(sampler.xi_mc)

I get

[[ 7. 17.]
 [11. 17.]
 [ 7. 13.]
 [11. 13.]
 [15.  7.]
 [ 7.  7.]
 [15. 21.]
 [ 7. 21.]
 [ 3.  9.]
 [14.  9.]
 [ 3. 24.]
 [14. 24.]
 [ 6. 21.]
 [ 4. 21.]
 [ 6. 11.]
 [ 4. 11.]
 [ 2. 20.]
 [13. 20.]
 [ 2. 25.]
 [13. 25.]
 [13. 22.]
 [ 1. 22.]
 [13.  9.]
 [ 1.  9.]
 [ 5. 21.]
 [12. 21.]
 [ 5. 13.]
 [12. 13.]
 [ 4. 13.]
 [ 9. 13.]
 [ 4.  8.]
 [ 9.  8.]
 [11. 14.]
 [ 5. 14.]
 [11.  8.]
 [ 5.  8.]
 [ 5. 18.]
 [ 3. 18.]
 [ 5. 19.]
 [ 3. 19.]]

So I would stick with the MCsampler for now. I even think we could delete the QMCSampler alltogether. The only difference between MCSampler and QMCSampler is that the latter uses some space-filling design from SALib (maybe a Sobol sequence or a Latin Hypercube). However, this option is also available in ChaosPy, it's just a flag that we need to toggle. I'll raise this in a separate issue.

kbronik2017 commented 2 years ago

Hi,

Thank you (and non-integer values, yes, I forgot to mention it as well, thank you for pointing that out), actually, the settings were exactly as the follows (which did results in negative samples, version of ChaosPy==4.3.2)

params = { "Rayleigh": {"type": "integer", "min": 0, "max": 2000, "default": 1}, "Prandtl": {"type": "integer", "min": 0, "max": 8, "default": 7}, "DiffusionCoefficient": {"type": "integer", "min": 0, "max": 2, "default": 1}, "Temperature": {"type": "integer", "min": 0, "max": 81, "default": 1}, "out_file": {"type": "string", "default": "output.csv"} }

vary = { "Rayleigh": cp.DiscreteUniform(0, 1500), "Prandtl": cp.DiscreteUniform(1, 8), "DiffusionCoefficient": cp.DiscreteUniform(0, 2), "Temperature": cp.DiscreteUniform(1, 80) }

But it does not matter now, as you suggested, using the Monte Carlo sampler instead of QMC would be the best choice.

Best Kevin