Support for multidimensional arrays and complex numbers

carstenbauer commented 5 years ago

AFAICS, currently only Float64 numbers can be handled.

It would be good to have support for other number types (like ComplexF64) and also higher dimensional arrays (numbers being the 0 dimensional case).

ffreyer commented 5 years ago

Yes, currently only Float64 values are accepted. It should be straight forward to extend this to integers, complex numbers and arrays for the binning part. I'm not sure how to handle the variance etc for complex numbers and arrays though.

carstenbauer commented 5 years ago

The variance for an array is defined element wise.

For complex numbers you can go with Julia and define var(z) = var(real(z)) + var(imag(z)). (That's also what I do in MonteCarloObservable.jl)

julia> using Statistics

julia> z = rand(ComplexF64, 100);

julia> var(z)
0.18059172464677126

julia> var(real(z)) + var(imag(z))
0.18059172464677137

ffreyer commented 5 years ago

Both should be working now. Though for Arrays, you currently need to apply a "zero". For example, if you want to do a Binning Analysis with 3 component vectors, you need to supply a [0.0, 0.0, 0.0] to the Binning Analysis. Restricting the binning analysis to static arrays would remove the need for this, but I'm not sure if this is a good idea for usability.

carstenbauer commented 5 years ago

Great! There are a couple of things probably left to do

automatically convert compatible types (currently I can't add Int64s to a Float64 binner.
std_error(B) for a BinnerA with multidimensional data doesn't seem to be defined (probably just a missing method)
your std_error(B) estimation, i.e. picking the standard error of the highest level, is problematic as statistical fluctuations will dominate in the large bin size limit (due to less and less bins). There are a couple of heuristics that one could use, for example, to only consider bin sizes for which we have at least, say, 50 bins.

Are you at the university today? Maybe we can chat for 10 minutes. I'll arrive in about 20 minutes.

ffreyer commented 5 years ago

automatically convert compatible types (currently I can't add Int64s to a Float64 binner.

If data pushed to the binning analysis varies in type, does that not indicate some issue with surrounding code? (type instability or logical error)

std_error(B) for a BinnerA with multidimensional data doesn't seem to be defined (probably just a missing method)

It was already defined, but I forgot to add a default value for the binning level. Should be working now.

your std_error(B) estimation, i.e. picking the standard error of the highest level, is problematic as statistical fluctuations will dominate in the large bin size limit (due to less and less bins). There are a couple of heuristics that one could use, for example, to only consider bin sizes for which we have at least, say, 50 bins.

That's true, and I initially limited the final level of the binning tree to have at least M values. But I think it's better to have the binning analysis generate every level. That way the user can check/decide how many levels should be ignored. Reducing the data accordingly is easy - you just ignore the last few binning levels. However, if you impose a limit beforehand you have to do extra work if you ever wanted to check the fluctuations on the final levels.

A more gentle approach would be to limit the output of all_x methods to include at least M bins for their final level. I.e. change

function all_vars(B::BinnerA{N}) where {N}
    [var(B, lvl) for lvl in 0:N-1 if B.count[lvl+1] > 0]
end

to

function all_vars(B::BinnerA{N}, min_bins=50) where {N}
    [var(B, lvl) for lvl in 0:N-1 if B.count[lvl+1] >= min_bins]
end

carstenbauer commented 5 years ago

If data pushed to the binning analysis varies in type, does that not indicate some issue with surrounding code? (type instability or logical error)

It might. But why not make it possible? It doesn't hurt, does it? It's at least convenient for interactive usage.

But I think it's better to have the binning analysis generate every level.

I think it's good that all levels are processed, and I'm also fine with all_* methods to show all the information, but std_error should give a reasonable error and the last one isn't reliable.

ffreyer / BinningAnalysis.jl

Support for multidimensional arrays and complex numbers #1