Question: How do I determine the number of samples for Derivative-based Global Sensitivity Measure?

SALib / SALib

Sensitivity Analysis Library in Python. Contains Sobol, Morris, FAST, and other methods.

http://SALib.github.io/SALib/

MIT License

877 stars 237 forks source link

Question: How do I determine the number of samples for Derivative-based Global Sensitivity Measure? #169

Closed ivan-marroquin closed 6 years ago

ivan-marroquin commented 6 years ago

Hi,

I have the following problem definition:

problem= { 'num_vars': 6, 'names': ['Amplitude', 'Bandwidth', 'Envelope', 'Instantaneous Frequency', 'Sweetness', 'Thin Bed'], 'bounds': [[min_amplitude, max_amplitude], [min_bandwidth, max_bandwidth], [min_envelope, max_envelope], [min_instantaneous_frequency, max_instantaneous_frequency], [min_sweetness, max_sweetness], [min_thin_bed, max_thin_bed]], 'distributions': ['norm', 'norm', 'norm', 'norm', 'norm', 'norm'] }

The input and model data is in an 2D array of size 7344 x 7. The last column contains the model output.

When I ran the command line: Si= dgsm.analyze(problem, attributes_data[:, 0:6], attributes_data[:, 6], num_resamples= 7, print_to_console= True)

it gives me this error: Incorrect number of samples in model output file

is there a way to determine the number of samples to use for this method?

Many thanks, Ivan

jdherman commented 6 years ago

Hi @ivan-marroquin , can you please post the full code example, including the parameter sampling? You can leave out the part where you run the model itself.

This error results from a mismatch between the sample size and the size of the model output given to the analyze function. So we should look at how the samples are being generated. The actual number of samples should not matter, at least not for this error.

ivan-marroquin commented 6 years ago

Hi @jdherman

I sent you a copy of the script and input data. The code line for the DGSM Method is 133.

Many thanks, Ivan

Morris_computation_issue.zip

jdherman commented 6 years ago

It looks like you are trying to do this with a predefined dataset, because you are not using SALib to sample the model parameters.

The only method in SALib that will work for this approach is the Delta method. Here is the example: https://github.com/SALib/SALib/blob/master/examples/delta/delta.py

The DGSM method requires sampling the parameters using the finite difference method, as shown in the example here: https://github.com/SALib/SALib/blob/master/examples/dgsm/dgsm.py

This is why the error is occurring, because your samples were not generated with this method and so do not have the right number of rows.

ivan-marroquin commented 6 years ago

Thanks for the clarification. I am using a predefined dataset to test Fourier Amplitude, Morris, Sobol, and Delta methods (their command lines are part of the script that I sent you).

If I understood well your suggestion about that the only method to work under this circumstance is the Delta, does this mean that the results that I get for the Fourier Amplitude, Morris, and Sobol methods are not valid?

jdherman commented 6 years ago

Yes I saw that in the code. Those methods are designed to work with parameter samples in a specific order that are generated by SALib functions. I'm surprised the methods ran without error. The results would not be valid.

If you are not able to do more samples and model runs, then you should only use the Delta method on the data that you have.

Again I would refer you to the examples of every method to see how they generally work: https://github.com/SALib/SALib/tree/master/examples

ivan-marroquin commented 6 years ago

Hi John Herman,

Thanks a lot for you comments and suggestions. I would like to give you a little background of my interest on Sensitivity Analysis.

I am conducting unsupervised classification analysis. The quality of the output depends on the input features used during the analysis. So, I am interested to see how these inputs rank in terms of sensitivity with respect to the output.

I looked at your examples to make sure that I extracted data points that satisfy FAST, Sobol, and Morris methods. So, I could get results.

According to your comments, the Sobol and Morris methods should not be used because they expect an specific order, which is only generated by SALib functions.

If I only have as solution the Delta method. Are there any other sampling methods in SALib that I can use ? By any chance, do you another SA method that applies well to my data sets?

Many thanks,

Ivan

jdherman commented 6 years ago

@ivan-marroquin I haven't worked in this application area, but it sounds like you might want to use machine learning methods instead of sensitivity analysis. scikit-learn has lots of methods for feature importance & selection that are probably better than SALib for what you want to do.