davidedema commented 2 months ago

I wrote a simple program that describes a Markov chain composed by 3 states and the relative transition probabilities:

state(s1).
state(s2).
state(s3).

0.2::init(state(s1)).
0.8::init(state(s2)).

0.8::next(s1,s1).
0.2::next(s1,s2).

0.3::next(s2,s2).
0.4::next(s2,s3).
0.3::next(s2,s1).

0.7::next(s3,s3).
0.3::next(s3,s1).

prob_reach(state(State), 0) :-
    init(state(State)).

prob_reach(state(State), Steps) :-
    Steps > 0,
    Steps1 is Steps - 1,
    next(S, State),
    prob_reach(state(S), Steps1).

query(prob_reach(state(s1), 1)).

When I run this, the returned probability is not the one that it is found with the classic formula (e.g. probability found with problog to be in the state s1 after 1 step is 0.3616 but the real one is 0.4). There is something that I'm missing with the probability calculation done by problog or there are errors in this simple program? Thanks

VincentDerk commented 2 months ago

You most likely intended to use annotated disjunctions.

0.2::init(state(s1)).
0.8::init(state(s2)).

means there is 0.2 probability that init(state(s1)) is true, and 0.8 probability that init(state(s2)) is true, but that includes a scenario where they are both true.

Annotated disjunctions, like the one below, imply a mutual exclusivity constraint such that this rule makes init(state(s1)) true with probability 0.2, or (mutual exclusivity or) init(state(s2)) with 0.8, or (mutex or) nothing true with 0.0 probability. Annotated disjunctions must sum to at most 1.0. They are allowed to sum to less than 1.0, in which case there is a probability that none of the probabilistic facts become true.

0.2::init(state(s1)) ; 0.8::init(state(s2)).

By using

0.2::init(state(s1)) ; 0.8::init(state(s2)).

query(prob_reach(state(s1), 1)). is indeed 0.4. Possibly you also want to make your transitions mutually exclusive, but it depends on your semantics. If you model a network of a disease spreading then you would for instance also allow it to spread to multiple.

Hope that helps.

davidedema commented 2 months ago

Thank you very much for the explanation, helped a lot!

VincentDerk commented 2 months ago

Perhaps also good to be aware of in your program, your next predicate is currently independent of Step so whether s1 connects to s2 (next(s1,s2)) is decided once and stays true/false throughout the steps. This is different semantics from

0.8::next(s1,s1, Step).
0.2::next(s1,s2, Step).

0.3::next(s2,s2, Step).
0.4::next(s2,s3, Step).
0.3::next(s2,s1, Step).

0.7::next(s3,s3, Step).
0.3::next(s3,s1, Step).

where the connection between s1 to s2 may differ per timestep. (This comment is on top of the annotated disjunction you probably wanted to use here.)

davidedema commented 2 months ago

Oh, thanks. Now it works as intended. Thank you very much!

davidedema commented 2 months ago

I adapted the program in this way

state(s1).
state(s2).
state(s3).

% Possible init states
0.2::init(state(s1)); 0.8::init(state(s2)).

% Transition model
0.8::next(s1,s1, Step); 0.2::next(s1,s2, Step).

0.3::next(s2,s1, Step); 0.3::next(s2,s2, Step); 0.4::next(s2,s3, Step).

0.3::next(s3,s1, Step); 0.7::next(s3,s3, Step). 

prob_reach(state(State), 0) :-
    init(state(State)).

prob_reach(state(State), Steps) :-
    Steps > 0,
    next(S, State, Steps),
    Steps1 is Steps - 1,
    prob_reach(state(S), Steps1).

query(prob_reach(state(s1), 5)).

It works as intended but if I pump up the number of steps (e.g. 40) it will return strange results:

prob_reach(state(s1),40):       156.13047

Is it some sort of overflow? For other numbers of steps < ~35 it works as intended

VincentDerk commented 2 months ago

That output is very concerning, thanks for reporting!

It will require some deeper investigation, but for now I have the following information and workaround:

If I swap out the dsharp knowledge compiler with c2d, the answer becomes 0.5999999999996348 (instructions below). I assume that is the correct probability.
Presumably the problem then lies with the dsharp binary, and our pipeline using PySDD is unaffected. For your specific problem the PySDD compilation seems to take a long time, so I suggest you use the c2d compiler workaround for now.

Swapping dsharp with c2d.

Download c2d from http://reasoning.cs.ucla.edu/c2d/
Rename c2d to cnf2dDNNF and copy the executable to the folder problog/bin/linux/ OR make it available through PATH
That is all. ProbLog will then automatically detect and use the c2d binary instead of dsharp.

Debugging information (for myself)

The dsharp compiler produces a wrong d-DNNF.

dSharp's unweighted model count is 3.97+61, which does not correspond to the count of its compiled NNF (according to query-ddnnf reasoner that is 1e+64).
The expected unweighted model count is 3.97e+61, according to D4, dsharp stats, and the NNF of c2d.

D4: The unweighted model count: 39708868943559345451429439397652844050730171778133720697929728
c2d: The unweighted model count of the nnf: 39708868943559345451429439397652844050730171778133720697929728
dsharp: The unweighted model count of the nnf: 1539200437371327859536739816372250725181561293303903889456627712
dsharp: The unweighted model count is 3.97089e+61

issue-113.cnf.txt

davidedema commented 2 months ago

Thanks for the answer, the workaround works

VincentDerk commented 2 months ago

Glad the workaround works!

This is potentially related to the following bug https://github.com/QuMuLab/dsharp/issues/12

ML-KULeuven / problog

Markov Chain -- dSharp compiler results in prob >1 #113

Swapping dsharp with c2d.

Debugging information (for myself)