bat_sample error - Githubissues

atasattari commented 1 year ago

Hi, I followed the tutorial to define a custom log-likelihood to give that to bat_sample(). My likelihood seems to work fine, however, bat_sample() doesn't work for me. After some time I get the error below:

AssertionError: length(chains) == nchains Stacktrace: [1] mcmc_init!(rng::Random123.Philox4x{UInt64, 10}, algorithm::MetropolisHastings{BAT.MvTDistProposal, RepetitionWeighting{Int64}, AdaptiveMHTuning}, density::PosteriorDensity{BAT.TransformedDensity{BAT.GenericDensity{var"#36#37"{Vector{Float64}, typeof(pdf)}}, BAT.DistributionTransform{ValueShapes.StructVariate{NamedTuple{(:offset, :resolution, :k)}}, BAT.InfiniteSpace, BAT.MixedSpace, BAT.StandardMvNormal{Float64}, NamedTupleDist{(:offset, :resolution, :k), Tuple{Product{Continuous, Uniform{Float64}, Vector{Uniform{Float64}}}, Uniform{Float64}, Uniform{Float64}}, Tuple{ValueAccessor{ArrayShape{Real, 1}}, ValueAccessor{ScalarShape{Real}}, ValueAccessor{ScalarShape{Real}}}}}, BAT.TDNoCorr}, BAT.DistributionDensity{BAT.StandardMvNormal{Float64}, BAT.HyperRectBounds{Float32}}, BAT.HyperRectBounds{Float32}, ArrayShape{Real, 1}}, nchains::Int64, init_alg::MCMCChainPoolInit, tuning_alg::AdaptiveMHTuning, nonzero_weights::Bool, callback::Function) @ BAT ~/.julia/packages/BAT/XvOy6/src/samplers/mcmc/chain_pool_init.jl:160

Any idea what could be the issue?

Reproducer:

using Random, LinearAlgebra, Statistics, Distributions, StatsBase, Plots
using BAT, IntervalSets, ValueShapes, TypedTables 
import SpecialFunctions
using Profile, ProfileView, QuadGK

##  Error function.
function my_erf(x,coeff,base)
    return coeff/2*(1 .+ SpecialFunctions.erf.(x)) .+ base
[mcmc_BAT_Julia.pdf](https://github.com/bat/BAT.jl/files/11219143/mcmc_BAT_Julia.pdf)

end
## Step function.
function my_step(x,coeff,base)
    return coeff/2*(1 .+ sign.(x)) .+ base
end
## Model (sum of two errors or steps).
function count(p::NamedTuple{(:offset, :resolution, :k)}, x)
    step1_coeff = 6+p.k
    step2_coeff = 2-p.k
    if p.resolution> 0.0000001
        step1_x = (x .- p.offset[1])/(sqrt(2)*p.resolution)
        step1 = my_erf(step1_x,step1_coeff,4)

        step2_x = (x .- p.offset[2])/(sqrt(2)*p.resolution)
        step2 = my_erf(step2_x,step2_coeff,0.)
    else
        step1_x = (x .- p.offset[1])
        step1 = my_step(step1_x,step1_coeff,4)

        step2_x = (x .- p.offset[2])
        step2 = my_step(step2_x,step2_coeff,0)
    end
    return step1+step2
end 
## Integral of model
function get_integral(p::NamedTuple{(:offset, :resolution, :k)},low, high)
    total,error = quadgk(x -> count(p,x),low,high)
    return total
end
## PDF 
function pdf(p::NamedTuple{(:offset, :resolution, :k)},x,low, high)
    total = get_integral(p,low,high)
    value = count(p,x)
    return value/total
end
## Sampler
function sampler(p::NamedTuple{(:offset, :resolution, :k)},n,low,high)
    max = count(p,high)
    arr = Array{Float64}(undef, n)
    i = 1
    while i<=n
        y = rand()*max
        x = rand()*(high-low)+low
        if y <= count(p, x)
            arr[i] = x
            i+=1
        end
    end
    return arr
end  

true_par_values = (offset = [99, 150], resolution = 5, k = 0)
arr = sampler(true_par_values,10000,0,500)
## Unbinned log-likelihood 
likelihood = let data = arr, f =pdf
    params -> begin
        function event_log_likelihood(event)
            log(f(params,event,0,500))
        end
        return LogDVal(mapreduce(event_log_likelihood, +, data))
    end
end
## Flat priors
prior = NamedTupleDist(
    offset = [Uniform(50, 150), Uniform(80, 220)],
    resolution = Uniform(0,20),
    k = Uniform(-6,2)
)
## Posterior 
posterior = PosteriorDensity(likelihood, prior)
## running bat_sample
samples = bat_sample(posterior, MCMCSampling(mcalg = MetropolisHastings(), nsteps = 10^3)).result

atasattari commented 1 year ago

My notebook with some plots is also attached here. mcmc_BAT_Julia.pdf

oschulz commented 1 year ago

Hi @atasattari, BAT.jl has trouble running with only one chain, could you run with two chains at least? It's on the to-do list, we just haven't gotten around to fixing that yet (since people typically run with 4 or 8 chains).

Also I recommend using BAT.jl#main currently instead of the last 2.x release, which is quite old. We're preparing a BAT.jl v3.0 but there's some changes to be made first.

atasattari commented 1 year ago

I updated the package and increased the number of chains to 4. It took almost 45 minutes to get to Begin tuning of 4 MCMC chain(s). Does that make any sense? Would you happen to notice an obvious issue in my cost function or Julia code? Is there a way to get a progress bar while running the sampler? Just to compare, I implemented a similar cost function in Python optimized with Numba and used emcee for sampling. With 30 walkers ('emcee' language), the computation with the same number of steps takes less than a minute which is confusing.

oschulz commented 1 year ago

One important difference is that emcee can work in parallel (we should probably add an emcee-like sampler to BAT) and the many walkers share information about the posterior, so they "help each other" converges. While Metropolis or HMC single walker chains can converge more slowly, since each walker has to find the "way" to the typical set by itself. We have some code for improved burn in based on the "Pathfinder" algorithm and the Robust adaptive Metropolis Algorithm, but it's not integrated in BAT.jl yet. The plan is to add this and some other shiny new features until summer. We'll also add progress bars, so one can see how sampling progresses - BAT.jl can nest algorithms, so we need nested progress display, which makes it a bit more tricky than just using ProgressMeter.

In case it's a performance issue with you model, could you run BenchmarkTools.@benchmark logdensityof($your_posterior, rand($prior)) or so, and maybe check your likelihood implementation with @code_warntype?

atasattari commented 1 year ago

Thanks so much for the hints. I performed a basic comparison between the log-likelihood call times in Julia and Python. The Python one is ~20-30% faster. I'm not a Julia expert, but there seem to be some tweaks to boost the performance. At least running in parallel seems to be much simpler in Julia. 'emcee' has issues pickling some of the Numba functions which prevent it to run in parallel. I'm not sure how many events were thrown away before convergence but it took BAT almost 8 hours to produce a sample of 10^3 events for my model. Apart from the performance, other features look really cool. I'll update you if I figure out something about the performance. Also, please let me know if you guys have a tutorial or a Jupyter Notebook that shows some performance optimization or fancier tricks.

oschulz commented 1 year ago

If you can send me your complete notebook (with necessary data files) I can take a look at potential performance issues.

oschulz commented 11 months ago

@atasattari is this still an active issue?

atasattari commented 11 months ago

@oschulz The issue is resolved. It was mainly slowness and a lack of log reports for debugging. Turned out to be repetitive integral calculations to find the PDF for custom models in the likelihood. This situation should be common for unbinned likelihood studies. Would it be reasonable to add a similar example to the tutorial? I can provide my code for reference if needed.

oschulz commented 11 months ago

Would it be reasonable to add a similar example to the tutorial? I can provide my code for reference if needed.

Thanks you, yes. We we thinking about adding more examples/tutorials anyway.

bat / BAT.jl

bat_sample error #401