ML-KULeuven / problog

ProbLog is a Probabilistic Logic Programming Language for logic programs with probabilities.
https://dtai.cs.kuleuven.be/problog/
317 stars 35 forks source link

Persisting Problog Python objects #93

Closed rickshilling closed 1 year ago

rickshilling commented 2 years ago

Suppose I have

write_static_csv(static_data, ‘static_facts.csv’)

while not stop:

dynamic_data = get_dynamic_data()
write_dynamic_csv(dynamic_data, ‘dynamic_facts.csv’)
problog_string = """
    :- use_module(library(db)).
    :- csv_load(‘static_facts.csv’,’static_facts’).   % This is huge but static.  
    P::static_fact(X) :- static_facts(P,X).
    :- csv_load(‘dynamic_facts.csv’,’dynamic_facts’). % This is small but changes per iteration.  
    P::dynamic_fact(Y) :- dynamic_facts(P,Y).
    predicate_instance(X,Y) :- static_fact(X), dynamic_fact(Y), some_relation(X,Y).
    query(predicate_instance(_,_)).
    """
problog_program = PrologString(problog_string)
logic_formula = LogicFormula.create_from(problog_program)             # ground program
directed_acyclic_graph = LogicDAG.create_from(logic_formula)          # break cycles
sentential_decision_diagram = SDD.create_from(directed_acyclic_graph) # compile CNF to SDD
Problog_Output = sentential_decision_diagram.evaluate()
stop = determine_to_stop(Problog_Output)

Is there a way to presist the Problog engine, keeping the static facts between iterations, but subtracting the old and adding the new dynamic data without creating the Logic formulas, DAG, and SDD objects?
Also, what are some possible ways speed up the Problog execution steps.
Currently, sentential_decision_diagram.evaluate() takes 20+ minutes for me. Thanks.

VincentDerk commented 1 year ago

Perhaps of interest (source), you can extend a database so you can create 1 "base" database and for each batch of dynamically added facts you can create an extension of that base database. Throwing away an extended database when you no longer need it.

m1 = """
0.3::a(1).
query(a(X)).
"""
db = DefaultEngine().prepare(PrologString(m1))
print (get_evaluatable().create_from(db).evaluate())

m2 = """
0.4::a(2).
"""
db2 = db.extend()
for statement in PrologString(m2):
    db2 += statement

print (get_evaluatable().create_from(db2).evaluate())
print (get_evaluatable().create_from(db).evaluate())

results in

{a(1): 0.3}
{a(1): 0.3, a(2): 0.4}
{a(1): 0.3}

Here we made an extension of database db called db2. This new database contains all the clauses of the original (without copying them). We can discard any modifications by simply discarding db2.

So something more like

# -- prepare static db part --
static_problog_string = """
    :- use_module(library(db)).
    :- csv_load(‘static_facts.csv’,’static_facts’).   % This is huge but static.  
    P::static_fact(X) :- static_facts(P,X).
    P::dynamic_fact(Y) :- dynamic_facts(P,Y).
    predicate_instance(X,Y) :- static_fact(X), dynamic_fact(Y), some_relation(X,Y).
    query(predicate_instance(_,_)).
    """
static_db = DefaultEngine().prepare(PrologString(static_problog_string))

# while loop for dynamic part
while not stop:
    # -- complete program --
    extended_db = static_db.extend()
    dynamic_data = get_dynamic_data()
    # instead of writing dynamic_data into csv
    # and then writing csv into ProgLog db
    # consider writing straight into the extended_db.
    # not sure what efficiency difference will be...
    write_dynamic_csv(dynamic_data, ‘dynamic_facts.csv’)
    extended_db += PrologString(":- csv_load(‘dynamic_facts.csv’,’dynamic_facts’).")

    # -- evaluate --
    # if you do not time the separate steps, you can create them at once
    # using Problog_Output = SDD.create_from(extended_db).evaluate() 
    logic_formula = LogicFormula.create_from(extended_db)             # ground program
    directed_acyclic_graph = LogicDAG.create_from(logic_formula)          # break cycles
    sentential_decision_diagram = SDD.create_from(directed_acyclic_graph) # compile CNF to SDD
    Problog_Output = sentential_decision_diagram.evaluate()
    stop = determine_to_stop(Problog_Output)

Something that may help is passing sdd_auto_gc=True to SDD.create_from , sometimes it becomes slower though..

SDD.create_from(directed_acylyc_graph, sdd_auto_gc=True)