erdogant / bnlearn

Python package for Causal Discovery by learning the graphical structure of Bayesian networks. Structure Learning, Parameter Learning, Inferences, Sampling methods.
https://erdogant.github.io/bnlearn
Other
480 stars 46 forks source link

Checking CPD with a tolerance #60

Closed AlexandreDubray closed 1 year ago

AlexandreDubray commented 2 years ago

Hello,

For some data sets coming from the bnlearn repository, building the models yield warning that some CPD does not sum up to 1. It has been said in #13 that for some data sets there are inconsistencies in the data, but it is not always the case. For example, in the hailfinder data set there is this CPD:

probability ( TempDis | Scenario ) {
  (A) 0.13, 0.15, 0.10, 0.62;
  (B) 0.15, 0.15, 0.25, 0.45;
  (C) 0.12, 0.10, 0.35, 0.43;
  (D) 0.10, 0.15, 0.40, 0.35;
  (E) 0.04, 0.04, 0.82, 0.10;
  (F) 0.05, 0.12, 0.75, 0.08;
  (G) 0.03, 0.03, 0.84, 0.10;
  (H) 0.05, 0.40, 0.50, 0.05;
  (I) 0.80, 0.19, 0.00, 0.01;
  (J) 0.10, 0.05, 0.40, 0.45;
  (K) 0.2, 0.3, 0.3, 0.2;
}

which is perfectly fine but fails to be built correctly. In particular the fifth row is seen as not sum up to one because, in my python shell, I have

>>> 0.04 + 0.04 + 0.82 + 0.1
0.9999999999999999
>>>

Altough the file is perfectly fine, warnigs are emitted. I think that the comparison should allow a small deviation from 1 in order to accomodate such float representation problems.

erdogant commented 2 years ago

True. Floating Point Errors are not fixed. Do you have a suggestion for a fix? I can use Decimal but I am not a fan of it.

from decimal import Decimal
nums = [0.04, 0.04, 0.82, 0.1]
float(np.sum(list(map(lambda x: Decimal(str(x)), nums))))
1.0
AlexandreDubray commented 2 years ago

I guess the easiest way would be to check something along the line of abs(1 - sum(nums)) < 0.00001 (the threshold is given as example). If this is just to check that the CPDs are correct, then it should be enough. If there are some precision problems during queries on the networks (if they are very large), then maybe advanced floats should be used (altough for most use cases basic float should be fine)

erdogant commented 2 years ago

I changed the type to Decimal (I think this is the cleanest fix) before checking whether it sums up to exactly one. Can you check whether this solves your issue? Update to the latest version (>= 0.7.7) with:

pip install -U bnlearn

erdogant commented 1 year ago

I am closing this issue. Please re-open if required.