Closed ajdapretnar closed 3 years ago
Why is this discretized to 3 bins when there are only 2 distinct values?
Count'em.
Var Category
continuous Bad Good
class
99.93000000000002 Good
99.93000000000002 Good
99.93000000000002 Good
99.93000000000002 Good
99.93000000000004 Bad
99.93000000000004 Bad
99.93000000000004 Good
99.93000000000004 Good
99.93000000000004 Good
99.93000000000004 Good
99.95000000000002 Bad
99.95000000000002 Bad
99.95000000000006 Bad
I know about the rounding error, but how does the same float get rounded differently?
I don't understand - what do you mean by same float "rounded differently"? This data is read from file in which it has four distinct values there, hence three ((hum, three?) bins. I think that Orange works correctly regarding the second issue.
The first one is more interesting: differences between thresholds are below precision, so two consecutive thresholds are the same and assertion fails. I added np.unique
to handle this, now I'm trying to write a test.
I identified the problem. The data was sent through Pivot Table and asked for a mean. Even though all the instances for the group have the same value (99.93), when they are grouped, the "rounding error" appears, but it is not visible in the data table (the value looks like 99.93, even when it is not). Hence the misunderstanding. I agree that for the second issue Orange indeed works as expected.
First issue: Naive Bayes fails with a strange error (AssertionError without context). After investigation, the issue is in Discretize, which tries to discretize to 4 bins where there are only 3 values. Second issue: Discretize returns strange (likely wrong) bins.
discretize-error.zip
[ ] What's your environment?
Operating system: OSX
Orange version: 3.28.dev
How you installed Orange: conda
[ ] Additional information
First error:
The problem is it is badly formatted in Test and Score and unclear in Naive Bayes widget.
Second issue: Why is this discretized to 3 bins when there are only 2 distinct values?