ML-KULeuven / problog

ProbLog is a Probabilistic Logic Programming Language for logic programs with probabilities.
https://dtai.cs.kuleuven.be/problog/
317 stars 35 forks source link

SDD memory error & d-DNNF example ignoring #118

Closed Zarach closed 1 month ago

Zarach commented 5 months ago

Dear problog team,

I try to learn a Naive Bayes classifier for documents based on occuring words.

There are about 1000 examples and they should be classified (to be precise, the probability of the words, used at a class should be learned) like in your online example but to one of 4 classes.

If I run LFI with SDD a memory error occures.

If I run it with d-DNNF, it ignores nearly all of the given examples. It runs into the following error because the calculated weight is very low, even at the beginning of the learning process:

    if self.semiring.is_zero(self._get_z()):
        raise InconsistentEvidenceError(context=" during evidence evaluation")

I guess this is the wanted behavior, but could you explain in an abstract way, why examples get ignored from the beginning? Does it mean, that there is not enough information (not enough words used) in these examples to learn parameters?

rmanhaeve commented 5 months ago

Hi Zarach

Could you perhaps give is an example of this behaviour?

Kind regards, Robin

Zarach commented 5 months ago

Hi Robin,

not that easy, because I run it in python, but I'll try. Here are 2 files (with 48 examples) which are examples for the ddnf problem which can be used in standalone mode. Hope it will be comparable to the python run. All examples get ignored when I run with ddnnf.

For the python run I also uploaded a txt-file with the list of examples, in python it is done with the Term() objects which you can't see in the txt file.

examples_small.txt program_small.txt example_list.txt

And another example file which should show the memory alloc error for sdd:

examples_big.txt

Kind regards, Benjamin

rmanhaeve commented 5 months ago

Hi Benjamin

It seems that you only give the training data, but there's no program attached. We'll need this as well to look into it in more depth.

Zarach commented 5 months ago

Hi Robin,

program_small.txt is the program which should be executable in problog standalone mode. At least on my side this works and reproduces the problem.

rmanhaeve commented 4 months ago

I have solved the issue with the inconsistent evidence error by setting the initial probabilities to t(0.1) for all words, and by using log-space calculations, i.e. by running it with lfi program_small.pl examples_small.pl --logspace

I'll now look into the memory issue

rmanhaeve commented 4 months ago

I have noticed a calloc when using SDDs. Have you tried using -k sddx ?