QuMuLab / dsharp

MIT License
16 stars 5 forks source link

Wrong MC: cnf clause too long to parse #15

Closed VincentDerk closed 2 years ago

VincentDerk commented 2 years ago

Hi,

The following CNF has a model count of 1, but dsharp reports 4: single-model-wrong-mc-cnf.txt

Why should it be 1?

This CNF is generated from a ProbLog program and by reasoning over it, it should be model count 1. This is confirmed by both model counters ganak and d4 reporting '1'. The model, according d4 using d-DNNF-reasoner, is here: correct-model.txt

Why does dsharp report 4?

I do not know yet, BUT...

The NNF generated by dsharp is here single-model-wrong-mc-nnf.txt. It contains two OR nodes, one over 20057 and one over 20058, allowing them to be either true or false. I also used d-DNNF-reasoner to generate all 4 models and they only differ in their value of 20057 and 20058. In reality, both 20057 and 20058 must be false (cf. reasoning over the problem + d4 model at bottom). So figuring out where this OR-node comes from and why, might tell us more.

The NNF was generated using ./dsharp -Fnnf ./single-model-wrong-mc.nnf -smoothNNF -disableAllLits ./single-model-wrong-mc.cnf

haz commented 2 years ago

Hey @VincentDerk . Few quick questions to get some context (may be quicker than the eventual time I'll get to look into this):

  1. Can you confirm that all the variable #'s are used and none missing? I.e., every variable appears in at least one clause.
  2. Similarly, max var # should be the # of vars reported at the top.
  3. What happens with 20057 added as a unit clause? (either as a lit or it's negation)
  4. This a bug only with that configuration? In particular, if you run without those two flags, does it report the correct count?

From there, it will hopefully point to where I should look first (e.g., if it's still wrong without smoothing, then there's no need to look there). May also be useful to post the command-line output.

Cheers!

VincentDerk commented 2 years ago
1. Can you confirm that all the variable #'s are used and none missing?

They are very likely all used. I'm not 100% certain, but 20057 and 20058 are definitely used.

2. Similarly, max var # should be the # of vars reported at the top.

This is correct.

3. What happens with 20057 added as a unit clause? (either as a lit or it's negation)

The model count becomes 2.

4. This a bug only with that configuration? In particular, if you run without those two flags, does it report the correct count?

Both./dsharp -Fnnf ./dsharp-test.nnf ./single-model-wrong-mc.cnf and ./dsharp ./single-model-wrong-mc.cnf also report model count 4.

VincentDerk commented 2 years ago

I should add that all clauses are removed so it might be related to the preprocessing.. (unless that part is not the preprocessing and I'm reading it wrong...)

output ``` #Variables: 20058 #used Variables: 20056 #Clauses: 74910 <----- #Clauses removed: 74910 <----- #added Clauses: 0 # of all assignments: 1.14724e+6038 = 2^(20058) Pr[satisfaction]: 3.48664e-6038 # of solutions: 4 #SAT (full): 4 Num. conflicts: 0 Num. implications: 0 Num. decisions: 0 max decision level: 0 avg decision level: 0 avg conflict level: 0 avg solution level: 0 CCLLen 1stUIP - max: 0 avg: 0 CCLLen lastUIP - max: 0 avg: 0 FormulaCache stats: memUse: 8 cached: 1 used Buckets: 1 cache retrievals: 0 cache tries: 1 Time: 0.043521s Runtime:0.043521 ```
haz commented 2 years ago

Aha! I think the answer to Q1 may be off. Check the first couple of lines:

#Variables:     20058
#used Variables:    20056

Also, those two variables happen to be the largest 2. So what I think is happening is that there are a pair of variables not mentioned at all. At least that's what the system is thinking.

VincentDerk commented 2 years ago

I checked it now, and can confirm that all variables occur in at least one clause. So somehow during the preprocessing it decides the values of 20057 and 20058 do not matter?

haz commented 2 years ago

Perhaps, but I think more likely is that it never actually gets to read them. Too much of a coincidence to have those as the highest two variable #'s. So likely something with the parsing not getting through the entire file.

haz commented 2 years ago

My money's on this as the culprit:

https://github.com/QuMuLab/dsharp/blob/master/src/src_sharpSAT/MainSolver/InstanceGraph/InstanceGraph.cpp#L619 https://github.com/QuMuLab/dsharp/blob/master/src/src_sharpSAT/MainSolver/InstanceGraph/InstanceGraph.cpp#L651 https://github.com/QuMuLab/dsharp/blob/master/src/src_sharpSAT/MainSolver/InstanceGraph/InstanceGraph.cpp#L673

VincentDerk commented 2 years ago

Good find! That was indeed the issue. I raised it to some higher value and it correctly reported model count 1.

Edit: 20057 and 20058 were both involved in a very large clause. With this buffer limit it likely didn't parse properly. These 2 large clauses also caused c2d to refuse the CNF, stopping and reporting it does not allow clauses of size >10k. Perhaps I should have realised the connection before :)

haz commented 2 years ago

Awesome stuff. I think a decent fix would be to crash out on the parsing and point users to where they can change the constants. Much better than silently going on to give bogus results!

Let's leave this issue open, and I'll use this thread as a roadmap of how to fix it when I have a chance.

Thanks for reporting!

VincentDerk commented 2 years ago

We can also consider using a std::string which grows more arbitrary.

    std::string buf;
    while (std::getline(inFile, buf))

or like sharpSAT/ganak extracting literals one by one using >> until '0' and then ignoring till linebreak or EOF inFile.ignore(numeric_limits<streamsize>::max(), '\n');

haz commented 2 years ago

Sorry for the delay on this -- took a while to find the time to look through. A major improvement on the parsing, thanks!