Open eriq-augustine opened 2 years ago
Thank you for the detailed explanation. The programs you provided contains some very cyclical rules. We are aware of some performance issues regarding cycle breaking, and we're looking into it right now. We'll let you know when we have an update.
Thanks, Robin. Keep me up to date and let me know if you need anything else.
Hello, I looked into it a bit.
The compilation step, a #P-complete problem, causes the long processing time. I tested this on both SDD (-k sdd
) and dSharp (-k ddnnf
).
There might be a bug that makes some ground programs unnecessarily larger, something Robin and I were looking into, but I have yet to confirm whether that occurs here too. The ProbLog problem you gave seems to just scale terribly, and so does the compilation process.
Below are some statistics.
The issue is with grounding the program (so not even cycle breaking). Apparently, as is also seen in profile report, it is spending most of the time in the Term's __eq__
function.
Since this is not cycle breaking related, and Robin knows most about the grounding, I'll reassign him. I will revisit the First Method to see if the ground program contains anything unnecessary but I will have to attend to some other tasks first so it might take a few days.
Thanks for the update. Let me know if you need anything on my side.
About method 2:
I don't think it's too surprising that it's slow. The grounding process grows exponentially with the number of knows_l1/2
facts. SWIPL too slows down on this problem. You can see this by running time(findall(0, knows(0,1), _)).
and commenting out a few knows(X,Y) and knows(Y,X). (and removing the probability labels, ofcourse).
Tabling doesn't helps problog here, as the 3rd argument keeps changing. It's even possible that tabling is adding to the slowdown once the tables grow big.
@krishnangovindraj The exponential grounding time (and increased time due to memoizing the list) makes sense, but do you think that's all there is to it? I ran this for 12 hours at 4GHz. Do you think I should have just run it for longer, or there is a bug causing an infinite loop?
I have let an instance of the second method run for a week now and it is still going. It does slowly increase it's memory consumption, about 0.25 GB a day.
I was able to run the first method to completion on a larger machine. It took 27 hours and 148 GB, and the output looks like:
[DEBUG] Output level: DEBUG
[INFO] Propagating evidence: 0.0000s
[DEBUG] Grounding query 'knows(0,1)'
[DEBUG] Ground program size: 303
[DEBUG] Propagated evidence: []
[INFO] Grounding: 0.0596s
[DEBUG] Ground program size: 4548
[INFO] Cycle breaking: 2.6656s
[INFO] Clark's completion: 0.0127s
[INFO] DSharp compilation: 100008.6578s
[INFO] Total time: 100011.4259s
knows(0,1): 1
I've finished tracking down a bug today. I hope to revisit this one in a few days but it might take a while. Something I did already notice and is perhaps related is the following:
The ground program of
:- use_module(library(lists)).
1/2::throw([1]).
t :- throw(L), member(4, L).
query(t).
should be empty since member(4,L)
will never be true and is hence irrelevant to proof query(t)
. However, as you can see below, the ground program is not empty as it likely only realised afterwards that the atom was irrelevant. These unnecessary lines (or at least the ones I identified) do disappear when going to the LogicDAG after cycle breaking so they do not impact the compilation time. I doubt optimizing this will help the grounding time because you only notice later that it's irrelevant. It does however impact memory (and maybe cycle breaking run time?) so it might be something we can consider looking deeper into.
=== Ground Program ===
ground: 0.0010s
1: atom(identifier=11, probability=1/2, group=None, name=throw([1]), source=None, is_extra=False)
Queries :
* t : None [query]
=== DAG ===
Queries :
* t : None [query]
=== Evaluation ===
{t: 0.0}
Hey All,
I am having some issues with transitive rules and I was wondering if my results were expected or if there may be a bug lurking here.
I have a transitive rule that I have tried to implement in two different ways, both of which have unfavorable outcomes (consuming all memory and a presumably infinite loop).
All methods were run using the CLI from the ProbLog Python package (from PyPi, version 2.2.2):
Any help you can provide in getting either of these methods (or any other method of dealing with collective transitivity) would be greatly appreciated.
General Model
This is a small synthetic model with the end goal of inferring whether two people know each other. The data and structure for this model comes from the psl-examples repository: https://github.com/linqs/psl-examples/tree/develop/simple-acquaintances The specific rules that my initial ProbLog rules are based on come from here: https://github.com/linqs/psl-examples/blob/develop/simple-acquaintances/cli/simple-acquaintances.psl
These examples will only use three predicates:
knows/2
- The final target predicate, indicating two people know each other.knows_l1/2
- An intermediate predicate forknows/2
.knows/3
- Used in the second method to carry a list of seen nodes.In the examples I will provide, I stripped down the rules and data to the smallest set that still causes these issues (I have not fully tested every possible subset of the data since these can take a while, but I have cut it down considerably).
First Method
The first method uses a very straightforward approach to transitivity (with the hopes that memoization will stop cycles from happening).
The probabilistic facts enumerates through [0, 9] for each argument and excludes self references and the query (0, 1). The full file is attached: problog_transitive_method1.txt I also attached the output of a profile (py-spy top): problog_transitive_method1_profile_top.txt
When run, it hangs on the output:
The process will continue to use ~100% of a core and will keep accumulating memory until there is no more or it is killed. On my machine, the this took about 50GB of RAM and swap combined.
Second Method
The second method uses a commonly recommended pattern for transitivity in ProLog, by keeping a list of previously seen nodes.
The probabilistic facts enumerates through [0, 4] for each argument and excludes self references and the query (0, 1). The full file is attached: problog_transitive_method2.txt I also attached the output of a profile (py-spy top): problog_transitive_method2_profile_top.txt
When run, it hangs on the output:
So it looks like it is stuck in grounding. The process will continue to use ~100% of a core, but will not use more RAM. I ran this for about 12 hours until I killed it.