Closed ajdapretnar closed 3 years ago
It is not entirely clear what should this visualization look like.
The variable has values 1, 2, 3, 4, 5, 6, 7, 8, 24. For variables with so few distinct values, the widget can also assign have one bin for each value. But what is the bin width in this case? (Note that the x axis is not categorical.)
Or, in general, consider a variable whose distinct values are [1.5, 1.8, 2, 2.34, 10]. With one value per bin, what is the expected bin width?
I would tend to say this works as expected, but with unexpected results. We can add an information icon, explaining that each bin represents one unique value.
This doesn't happen only for single value per bin. I have a dataset with 56108 instances. The default visualization for a certain variable creates the following bins:
I would expect the following default bins: (, 49), [49, 50), [50, 51), [51, 52), [52,)
Alternatively, I would expect the bin not to stretch more than the other bins. If a bin represents a single value, then its width should be the same as other bins. For RAD, the first seven bins should be a single large bin, or the final two bins should be two narrow bins with empty space in between. No?
This doesn't happen only for single value per bin.
If bins' boundaries are not round decimal numbers, I guess they must represent single values. Don't you by chance have just 9 distinct values in your data? (We're not talking about single instances but about single values, right?.)
If a bin represents a single value, then its width should be the same as other bins.
If a bin represents a single value, then all bins represent single values and thus have various widths. In this particular case, all widths except the last were 1. But here are 17 instances from heart disease with 9 distinct values. Bar widths are 1, 2, 3 or 4.
All we can do is to let the widths of all bins equal the smallest distance betwen two values (what is currently shown as the narrowest bin).
I've done so in #5139. Please report how this looks on your data.
We're not talking about single instances but about single values, right?
You're right, I confused the two.
I'll check the PR.
Distributions in some cases shows unequal bin width, making the histogram confusing. I would expect all bins (bars) to be of equal width.
[ ] How can we reproduce the problem?
File (housing) - Distributions. Select RAD column with bin width at minimum. It seems to happens when there are a lot of integer-like floats and only some decimal data, e.g. [20.0, 20.0, 20.5, 21.0, 21.0, 21.0, 21.6, 24.0, 24.0., 24.0].
[ ] What's your environment?
Operating system: OSX High Sierra
Orange version: 3.28.dev
How you installed Orange: conda/pip