Support for SEM/SCM in encondings.jl

JorgeLuizFranco commented 1 month ago

A StructuralEquationModel - SEM is represented as a causal graph that also takes into account equations representing the relationships.

equations::Dict{Symbol, Function}

This could be a good representation since for every variable in the causal graph we would have:

equations = Dict(:X1 => f_x1, :X2 => f_x2, :X3 => f_x3, :X4 => f_x4)

And then, exogenous variables could be taken into account on these functions:

function f_x4(x3, u4) return f4(x3) + u4 end

The goal is to represent this into the existing encondings.jl

JorgeLuizFranco commented 1 month ago

@pat-alt what do you think of the following representation:

Dict{Symbol, Dict{String, Any}} with 7 entries:
  :stdt_tchr_ratio   => Dict("causal_variables"=>["spending_per_stdt"], "coefficients"=>[0.00109117])
  :spending_per_stdt => Dict("causal_variables"=>["tst_scores", "stdt_tchr_ratio"], "coefficients"=>[310.416, -590.187])
  :grad_rate         => Dict("causal_variables"=>["tst_scores"], "coefficients"=>[0.870062])
  :stdt_clss_stndng  => Dict("causal_variables"=>["tst_scores"], "coefficients"=>[0.604204])
  :rjct_rate         => Dict("causal_variables"=>["spending_per_stdt", "stdt_clss_stndng", "fac_salary"], "coefficients"=>[0.000927563, 0.264321, 0.000170924])
  :fac_salary        => Dict("causal_variables"=>["spending_per_stdt", "tst_scores", "stdt_accept_rate"], "coefficients"=>[0.40138, 938.447, -122.365])
  :tst_scores        => Dict("causal_variables"=>["spending_per_stdt", "grad_rate", "stdt_clss_stndng"], "coefficients"=>[0.000728516, 0.984613, -0.0544398])

JorgeLuizFranco commented 1 month ago

For what I understood this output of CausalInference.jl should become a CounterfactualData to be inside Encondings.jl, right?

mutable struct CounterfactualData
    X::AbstractMatrix
    y::EncodedOutputArrayType
    likelihood::Symbol
    mutability::Union{Vector{Symbol},Nothing}
    domain::Union{Any,Nothing}
    features_categorical::Union{Vector{Vector{Int}},Nothing}
    features_continuous::Union{Vector{Int},Nothing}
    input_encoder::Union{Nothing,InputTransformer}
    y_levels::AbstractVector
    output_encoder::OutputEncoder
    function CounterfactualData(
        X,
        y,
        likelihood,
        mutability,
        domain,
        features_categorical,
        features_continuous,
        input_encoder,
        y_levels,
        output_encoder,
    )

pat-alt commented 1 month ago

@pat-alt what do you think of the following representation:

Dict{Symbol, Dict{String, Any}} with 7 entries:
  :stdt_tchr_ratio   => Dict("causal_variables"=>["spending_per_stdt"], "coefficients"=>[0.00109117])
  :spending_per_stdt => Dict("causal_variables"=>["tst_scores", "stdt_tchr_ratio"], "coefficients"=>[310.416, -590.187])
  :grad_rate         => Dict("causal_variables"=>["tst_scores"], "coefficients"=>[0.870062])
  :stdt_clss_stndng  => Dict("causal_variables"=>["tst_scores"], "coefficients"=>[0.604204])
  :rjct_rate         => Dict("causal_variables"=>["spending_per_stdt", "stdt_clss_stndng", "fac_salary"], "coefficients"=>[0.000927563, 0.264321, 0.000170924])
  :fac_salary        => Dict("causal_variables"=>["spending_per_stdt", "tst_scores", "stdt_accept_rate"], "coefficients"=>[0.40138, 938.447, -122.365])
  :tst_scores        => Dict("causal_variables"=>["spending_per_stdt", "grad_rate", "stdt_clss_stndng"], "coefficients"=>[0.000728516, 0.984613, -0.0544398])

I still would prefer to work with a concrete struct, something along these lines:

struct SCM 
   variables::Vector{<:AbstractString}
   coefficients::Vector{<:Vector{<:AbstractFloat}}
end

That way we can define/overload the de-/encode methods for it.

You could take a look at the MultivariateStats.PCA struct for inspiration, for example.

pat-alt commented 1 month ago

For what I understood this output of CausalInference.jl should become a CounterfactualData to be inside Encondings.jl, right?

mutable struct CounterfactualData
    X::AbstractMatrix
    y::EncodedOutputArrayType
    likelihood::Symbol
    mutability::Union{Vector{Symbol},Nothing}
    domain::Union{Any,Nothing}
    features_categorical::Union{Vector{Vector{Int}},Nothing}
    features_continuous::Union{Vector{Int},Nothing}
    input_encoder::Union{Nothing,InputTransformer}
    y_levels::AbstractVector
    output_encoder::OutputEncoder
    function CounterfactualData(
        X,
        y,
        likelihood,
        mutability,
        domain,
        features_categorical,
        features_continuous,
        input_encoder,
        y_levels,
        output_encoder,
    )

Yes, exactly.

pat-alt commented 1 month ago

[x] Overload fit_transformer! (basically just call the functionality you have added to CI.jl)
[x] Add SCM or abstract parent type to InputTransformer union type
[ ] Think about how to overload encode_array and decode_array

JuliaTrustworthyAI / CounterfactualExplanations.jl

Support for SEM/SCM in encondings.jl #456