biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.8k stars 1k forks source link

Feature Statistics: legend for "Color", and "Selected Feature Statistics" output with richer data #6730

Open wvdvegte opened 7 months ago

wvdvegte commented 7 months ago

What's your use case?

  1. The Distribution column of Feature Statistics graphically displays a histogram showing the distribution of feature's values with values of numeric features split into bins. The bars can be colored according to different values of another value, but which color corresponds to which value of that variable is not shown. For instance, in these feature statistics on a dataset about 3D printers, it is unclear what the blue and red color stand for, unless a Color widget is added before of after Feature Statistics (which requires to have another window open to see just the legend):

    image
  2. The distribution histograms are nice, for an impression at a first glance, but the information is for visual inspection only. To get the numbers, Distribution has to be used (admittedly with more options to control the histograms) Since the histogram bar values are calculated anyway, and since Feature Statistics allows selection of one (or more) features, I guess it wouldn't be too difficult to add a "Selected Feature Statistics" output port that gives the same output as Distributions would give for that feature, a split by the same other feature, and the same bin width (which, for numerical values seems to be fixed so that there are 10 bins in Feature Statistics).

What's your proposed solution?

  1. Provide a legend for the colors, for instance to the right of the Color drop-down, or on mouse-hover over the unsplit bbars of the feature selected for coloring. This would really improve the usability of Feature Statistics.
  2. This would be more of a nice-to-have, as it can also be achieved by Distributions. It would require limiting the selection to one feature and replacing the Reduced Data output, of which the functionality can also be achieved by Select Columns.

Are there any alternative solutions? As indicated, by using Color and Distributions, respectively

janezd commented 7 months ago

Adding the legend should be simple enough, so we'll do it.

As for the output, you're right: it's just a nice-to-have feature. It would also be problematic if there are multiple selected rows, with possibly different number of bins. So let's leave this one to Distributions and Discretization widgets.