erdogant / bnlearn

Python library for learning the graphical structure of Bayesian networks, parameter learning, inference and sampling methods.
https://erdogant.github.io/bnlearn
Other
463 stars 45 forks source link

Fix the bug in the discretize method caused by the misalignment of column order #90

Closed ankh1999 closed 9 months ago

ankh1999 commented 9 months ago

Bug description: The elements in continuous_columns passed into the discretize method are not necessarily strictly increasing according to the column indices in the data. However, the discretize_all method returns continuous_edges assuming this order. This leads to a consistent error triggering at line 59, pd.Categorical.from_codes, with the message: ValueError: codes need to be between -1 and len(categories)-1.

To reproduce the issue, in the example provided in the "Advanced discretizing continuous data" section, replace continuous_columns = ["mpg", "displacement", "horsepower", "weight", "acceleration"] with continuous_columns = ["mpg", "displacement", "horsepower", "acceleration", "weight"], or change the order in any way that differs from the order in the df. This will consistently trigger the error.

To fix the issue, sorting the continuous_columns in the correct order.

erdogant commented 9 months ago

great fix!

erdogant commented 9 months ago

I published a new release!