Adds adaptive subset infinity, and subset simulation logging

AnderGray commented 10 months ago

An implementation of adaptive conditional subset simulation (subset infinity) from [1,2].

The idea behind the algorithm is to adaptively tune the standard deviation (s in our implementation) of the proposal distribution towards an optimal acceptance rate. Which they say is a_star = 0.44.

For each level, the algorithm iteratively computes the acceptance rate. This is done by partitioning the seeds (say into 4), and running usual subset infinity 4 times, and computing a scaling factor λ, which updates the std s_new = min(1, λ*s). In this case, s is updated 4 times for each level.

Here's my data structure:

mutable struct SubSetInfinityAdaptive <: AbstractSubSetSimulation
    n::Integer
    target::Float64
    levels::Integer
    λ::Real
    Na::Integer   
    s::Real
end

The new properties:

λ: initial scaling factor. Recommended to be set to 1
Na: the number of times to update the scaling factor in each level. The number of seeds is partitioned this many times

If Na=1, the update will only happen at each level.

Note, I've implemented the structures as mutable, because λ needs to be updated/saved at each level. Perhaps there's a better way, but this way the same probability_of_failure function could be used.

Here's the constructor:

    function SubSetInfinityAdaptive(
        n::Integer, target::Float64, levels::Integer, λ::Real, Na::Integer
    ) 
        number_of_seeds = Int64(max(1, ceil(n * target)))

        (Na <= number_of_seeds) ||
            error("Number of partitions Na must be less than `n` * `target`")
        (mod(number_of_seeds, Na) == 0) ||
            error("Number of partitions Na must be a multiple of `n` * `target`")
        (0 <= λ <= 1) ||
            error("Scaling parameter must be between 0.0 and 1.0. A good initial choice is 1.0")
        return new(n, target, levels, 1, Na, 1)
    end

The second check is important to ensure the seeds can be partitioned into Na parts.

Further future improvements

Have a different scaling factor for each dimension. The papers state that the sample std and mean can be computed between levels, allowing for the proposal's variance to be different in each dimension. This allows for sensitive dimensions to be determined by algorithm.

LOGGING

It would be good to Logging.jl the progress of the subset simulation algorithms. I've been running quite expensive simulations, and it's useful to print progress. I've added some, however I think they can be improved... Currently too much information is being printed.

Refs

[1] Papaioannou, I., Betz, W., Zwirglmaier, K., & Straub, D. (2015). MCMC algorithms for subset simulation. Probabilistic Engineering Mechanics, 41, 89-103.

[2] Chan, J., Papaioannou, I., & Straub, D. (2022). An adaptive subset simulation algorithm for system reliability analysis with discontinuous limit states. Reliability Engineering & System Safety, 225, 108607.

codecov[bot] commented 10 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (ebb1b7e) 98.68% compared to head (ffef5dc) 98.76%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #140 +/- ## ========================================== + Coverage 98.68% 98.76% +0.07% ========================================== Files 25 25 Lines 913 970 +57 ========================================== + Hits 901 958 +57 Misses 12 12 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

FriesischScott commented 10 months ago

Thanks for the PR, looking very nice overall!

The only thing I don't care for is the logging. It's very noisy as it is now.

I would prefer it if we remove it for now and make a larger PR for nice logging in more places then just subset.

AnderGray commented 10 months ago

Thanks for the review!

Yes, I agree about the logging. However, I am finding it useful when running expensive models. For now, would it work if we replace the @info with @debug?

The other thing we could consider adding is an example of an ablation test for all the subset methods. Something that plots the result as a function of input dimension and input samples N. Something similar to what's done in this example: #138. I have a script ready which does this. Can also add later once we fix that issue.

FriesischScott commented 10 months ago

I'd be fine with using @debug for now.

AnderGray commented 10 months ago

Looks good from my side.

What do you think about having an option without using Markov-chain, like we had it before?

FriesischScott commented 10 months ago

Thanks, I'm merging for now.

What do you think about having an option without using Markov-chain, like we had it before?

Does that make sense though? Does it still sample from the correct conditional pdf? Maybe we should run some benchmarks.

AnderGray commented 10 months ago

I think it's worth a test. I was getting sensible answers with the previous method.

Probably biased in some way ... I'm not sure how it can be tested. The additional parallelizability of it makes it interesting.

FriesischScott / UncertaintyQuantification.jl