Tractables / ProbabilisticCircuits.jl

Probabilistic Circuits from the Juice library
https://tractables.github.io/ProbabilisticCircuits.jl/dev
Apache License 2.0
104 stars 11 forks source link

Learning with missing information? #103

Closed robertfeldt closed 2 years ago

robertfeldt commented 2 years ago

Again, thanks for Juice, it's already very useful and the potential is fantastic.

Is there some way to learn also with incomplete information, i.e. where the values of a few features are missing for some entries? I tried playing around with a few different input types/formats where some value is missing but seems the call to learn_circuit invariably fails.

khosravipasha commented 2 years ago

Hi, yes we do have option for learning from missing data. There is two main options

  1. We have learn_circuit_miss, as counterpart to learn_circuit, which both do structure learning. Some examples here.

  2. HCLT structures. This is not fully documented yet, but here's a quick example. The learning is done in two steps, first learn a hidden chow liu tree (HCLT) structure, more detail on HCLTs , for this part we need to impute the data at moment:

# X_train
# train_imputed = train data with missing values imputed

num_hidden_cats = 32
num_clt_trees = 1
circuit = hclt(num_features(X_train); data = train_imputed, num_hidden_cats = num_hidden_cats, num_trees = num_clt_trees)
uniform_parameters(circuit; perturbation = 0.4)

Step 2: Learn paramters using EM using estimate_parameters_em_multi_epochs! More info here .

robertfeldt commented 2 years ago

This is great, thanks. Sorry, I had missed that (learn_circuit_miss) part of the documentation. Will check and experiment.