ML-KULeuven / problog

ProbLog is a Probabilistic Logic Programming Language for logic programs with probabilities.
https://dtai.cs.kuleuven.be/problog/
298 stars 34 forks source link

Explosive growth in compilation time #56

Closed hessammehr closed 3 years ago

hessammehr commented 3 years ago

I've been witnessing strange behaviour where compile time grows exponentially with simple addition of further queries. The general case (query(approaches(P1, P2)) keeps churning indefinitely. I've tried using the DSharp as well as SDD backends. Code to reproduce below:

person(adam).
person(brian).
person(cindy).
person(don).

trait(T) :- between(1, 4, T).
compatible(T1, T2) :- compatible(T2, T1).
0.3::compatible(T1, T2) :- trait(T1), trait(T2), T1 > T2.
0.6::has_trait(P, T) :- person(P), trait(T).
0.9::approaches(P1, P2) :- person(P1), person(P2), has_trait(P1, T1), has_trait(P2, T2), P1 \= P2, compatible(T1, T2).

query(approaches(adam, P2)). % takes 0.5 s to compile (DSharp)
query(approaches(brian, P2)). % takes 10 s if added to previous query
query(approaches(cindy, P2)). % takes 121 s if added to previous query
query(approaches(don, P2)). % OOM
VincentDerk commented 3 years ago

I can only reproduce these long compilation times using 'ddnnf'. With 'sdd', I can solve all queries, even after adding query(approaches(P1,P2)). to the program.

[INFO] Output level: INFO
[INFO] Propagating evidence: 0.0000s
[INFO] Grounding: 0.2621s
[INFO] Cycle breaking: 0.0475s
[INFO] Compiling SDD: 0.6366s
[INFO] Total time: 0.9825s
 approaches(adam,brian):    0.61977699
 approaches(adam,cindy):    0.61977699
   approaches(adam,don):    0.61977699
 approaches(brian,adam):    0.61977699
approaches(brian,cindy):    0.61977699
  approaches(brian,don):    0.61977699
 approaches(cindy,adam):    0.61977699
approaches(cindy,brian):    0.61977699
  approaches(cindy,don):    0.61977699
   approaches(don,adam):    0.61977699
  approaches(don,brian):    0.61977699
  approaches(don,cindy):    0.61977699

In general, compilation is an expensive step and a longer computation time is expected when the relevant theory becomes larger. The encoding used by 'sdd' can cause it to see less of a growth than when using the 'ddnnf' option depending on the theory.

hessammehr commented 3 years ago

Thank you for your comment. SDDs do seem remarkably fast in the absence of evidence. When evidence is presented, though, SDD inference time can increase more than 1000x. See below the same program with 3 observations added.

person(adam).
person(brian).
person(cindy).
person(don).

trait(T) :- between(1, 4, T).

0.2::has_trait(P, T) :- person(P), trait(T).

compatible(T1, T2) :- compatible(T2, T1).
0.2::compatible(T1, T2) :- trait(T1), trait(T2), T1 < T2.

approaches(P1, P2) :- approaches(P2, P1).
0.9::approaches(P1, P2) :- person(P1), person(P2), P1 @< P2, has_trait(P1, T1), has_trait(P2, T2), compatible(T1, T2).

evidence(approaches(cindy, don), false).
evidence(approaches(brian, don), true).
evidence(approaches(cindy, adam), true).

query(approaches(adam, P2)).
query(approaches(cindy, P2)).
[DEBUG] Output level: DEBUG
[DEBUG] Grounding evidence 'approaches(cindy,don)'
[DEBUG] Ground program size: 88
[DEBUG] Grounding evidence 'approaches(brian,don)'
[DEBUG] Ground program size: 130
[DEBUG] Grounding evidence 'approaches(cindy,adam)'
[DEBUG] Ground program size: 184
[INFO] Propagating evidence: 0.0001s
[DEBUG] Grounding query 'approaches(adam,X2)'
[DEBUG] Ground program size: 274
[DEBUG] Grounding query 'approaches(brian,X2)'
[DEBUG] Ground program size: 316
[DEBUG] Propagated evidence: [16, 162, 60, 25, 88, 54, 107, 83, 17, 15, 46, 78, 72, 39, 65, 33, 161]
[INFO] Grounding: 0.0675s
[DEBUG] Ground program size: 280
[INFO] Cycle breaking: 0.0083s
[INFO] Compiling SDD: 0.0683s
[INFO] Total time: 226.6326s
 approaches(adam,brian):        0.5040399 
 approaches(adam,cindy):        1         
   approaches(adam,don):        0.77337384
 approaches(brian,adam):        0.5040399 
approaches(brian,cindy):        0.77337384
  approaches(brian,don):        1   
VincentDerk commented 3 years ago

This is expected as bringing in evidence makes it closer to what causes the growth for 'ddnnf'. I believe that without evidence, the 'sdd' encoding has the benefit that it does not encode the whole theory, only what is necessary for exactly each query separately. By having evidence, the theory relevant for each query can grow.

hessammehr commented 3 years ago

I see your point, thank you very much for the clarification. I suppose at this point I must look for a more efficient representation that does not suffer from the same combinatorial explosion.

VincentDerk commented 3 years ago

If you don't mind diving in the code you can also consider trying other weighted model counters than dsharp or SDDs. For example, D4 seems to find a solution to the 'ddnnf' encoding of this problem within 73s.

hessammehr commented 3 years ago

Thanks for the tip; I was not aware of the D4 compiler. Would this alternative involve saving the CNF to disk and calling D4 manually or is there an automated mechanism within Problog already?

VincentDerk commented 3 years ago

I haven't looked into this too much but I assume you can try adding another Compiler option in the ddnnf_formula file. You'll have to

Depending on the compiler input and output format, you can re-use the code already in that file for the dSharp and c2d compilers.