Tractables / ProbabilisticCircuits.jl

Probabilistic Circuits from the Juice library
https://tractables.github.io/ProbabilisticCircuits.jl/dev
Apache License 2.0
104 stars 11 forks source link

make_observations? #66

Closed robertfeldt closed 2 years ago

robertfeldt commented 3 years ago

Thanks for Juice and this package; it looks great. I was checking out your ECML tutorial and around 2:44:34 make_observations is used to create observations for ADV inference to check fairness. However, neither the latest master or the stable version of ProbabilisticCircuits seems to have this method. I get:

julia> using ProbabilisticCircuits

julia> groups = make_observations([["male"], ["female"]])
ERROR: UndefVarError: make_observations not defined
Stacktrace:
 [1] top-level scope at REPL[2]:1

julia> VERSION
v"1.5.3"

Is this available in some other, related package or it's coming to this package?

khosravipasha commented 3 years ago

Hi, thanks for reaching out. Yes, make_observations is not defined in our packages since it is specific to the dataset used and how it was processed.. This was specific to the example we used, and you can find the definiton for make_observations in this notebook Juice-Example.ipynb.

The dataset we used for this example can be found in this repository https://github.com/Juice-jl/JuiceExamples. Easiest way to reproduce the examples is to use the notebook provided in this repository.

guyvdbroeck commented 3 years ago

Does that notebook still work out of the box with the current version of the package @khosravipasha? Probably good to make a version in Literate.jl to make a doc page + notebook for it.

khosravipasha commented 3 years ago

@guyvdbroeck Yes, I ran the notebook with recent version 0.2.3 and worked as expected. Yeah, good idea.

robertfeldt commented 3 years ago

Ok, great. Thanks for quick answer. Easy to understand based on this example. You might also want to point to the notebook in the docs when you point to the slides and video tutorial since others might check the code from the tutorial while watching it. I.e. on the bottom of the page: https://juice-jl.github.io/ProbabilisticCircuits.jl/stable/

guyvdbroeck commented 3 years ago

Thanks, I will reopen this until we have the right documentation in place.

robertfeldt commented 2 years ago

Not sure if you already support something like this and it maybe should be implemented in a different way but I've found these additions to be useful for easier testing and use of PCs:

make_pc_data(data::BitMatrix) = DataFrame(data, :auto)
make_pc_data(data::Matrix{<:Integer}) = make_pc_data(BitArray(data))
make_pc_data(data::Vector{<:Integer}) = make_pc_data(reshape(data, 1, length(data)))
function make_pc_data(data::Matrix{Union{Missing, I}}) where {I<:Integer}
    d = missings(Bool, size(data)...)
    d .= data
    DataFrame(d, :auto)
end
function make_pc_data(data::Vector{Union{Missing, I}}) where {I<:Integer}
    d = missings(Bool, 1, length(data))
    d[1, :] .= data
    DataFrame(d, :auto)
end

const FlexQueryType = Union{BitMatrix, 
        Matrix{<:Integer}, 
        Vector{<:Integer}, 
        Matrix{Union{Missing, I}} where {I<:Integer},
        Vector{Union{Missing, I}} where {I<:Integer}
}

ProbabilisticCircuits.marginal(pc, data::FlexQueryType) = 
    ProbabilisticCircuits.marginal(pc, make_pc_data(data))

ProbabilisticCircuits.log_likelihood_per_instance(pc::ProbCircuit, 
    data::Union{BitMatrix, Matrix{<:Integer}, Vector{<:Integer}}) =
    ProbabilisticCircuits.log_likelihood_per_instance(pc, make_pc_data(data))

Would you be interested in a PR for something like this (probably including also other queries than MAR and EVI) or up to everyone to do this themselves? It could even be generalized to allow a similar form to the one you used in the Jupiter notebook if given a map/dict from the names to the positions in the BitArrays.

guyvdbroeck commented 2 years ago

Thanks Robert. We certainly welcome pull requests. On the other hand, we are playing around with redesigning the data API to no longer use DataFrames, given that CUDA.jl now supports isbits unions (https://github.com/Juice-jl/ProbabilisticCircuits.jl/tree/cuda-refactor). So perhaps in January we will do another release that might break this stuff.

robertfeldt commented 2 years ago

Ok, no worries, let's see what January brings.

khosravipasha commented 2 years ago

The next API for queries in v0.4 will be something like this, for example can compute loglikelihoods as follows (this handles both cases of having missing values (marginals) and having no missing values (EVI)).

CPU version:, we use missing to denote missing values, so the data would be type of Array{Union{Missing, ...}}.

loglikelihoods(pc::ProbCircuit, data::Matrix)

GPU version: similary type of data is CuArray{Union{Missing, ...}}

loglikelihoods(bpc::CuBitsProbCircuit, data::CuArray; batch_size)

More documentation and examples for other queries will be out when we release v0.4 (soon hopefully).