biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.85k stars 1.01k forks source link

Group By: KeyError with newly added Variables #5802

Closed mw25 closed 2 years ago

mw25 commented 2 years ago

What's wrong? If first a subset of a table is given to group by (here: only sepal length with Select Columns), and then the remaining variables are added, the aggregations column in the aggregate table is empty. When clicking on the row, a KeyError exception occurs.

image

----------------------------- KeyError Exception ------------------------------
Traceback (most recent call last):
  File "D:\dev\Python\orange3\orange3\Orange\widgets\data\owgroupby.py", line 385, in __rows_selected
    active_aggregations = [self.aggregations[attr] for attr in selected_attrs]
  File "D:\dev\Python\orange3\orange3\Orange\widgets\data\owgroupby.py", line 385, in <listcomp>
    active_aggregations = [self.aggregations[attr] for attr in selected_attrs]
KeyError: ContinuousVariable(name='sepal width', number_of_decimals=1)
-------------------------------------------------------------------------------

How can we reproduce the problem?

    def test_reproduce_bug(self):
        partial_domain = Domain([ContinuousVariable("a")])
        partial_table = Table.from_table(partial_domain, self.data)

        self.send_signal(self.widget.Inputs.data, partial_table)
        assert len(self.widget.aggregations) == 1
        self.assert_aggregations_equal(["Mean"])

        self.send_signal(self.widget.Inputs.data, self.data)
        assert len(self.widget.aggregations) == 1  # should be 5
        self.assert_aggregations_equal(['Mean', '', '', '', ''])  # should be defaults, not ''

        # select a new variable - should not raise
        self.select_table_rows(self.widget.agg_table_view, [1])

Analysis and related problem

A ContextSetting is created with the partial table. With the full table, this context is recognized and the default values of the new variables in self.aggregations (owgroupby) are completely overwritten with the ContextSetting of the partial table. In self.agg_table_view are all variables, but in self.aggregations only one. KeyError.

If a ContextSetting is a dict, it completely replaces the current setting:

https://github.com/biolab/orange-widget-base/blob/51d4f35e2fe807d617aac0547532d220d8942679/orangewidget/settings.py#L201

Wouldn't it be better if:

What's your environment?

janezd commented 2 years ago

Thanks for this excellent report. How I wish all reports were so detailed, and also included some debugging and even a test! :)

You are right, of course.

A note to whomever who'd work on this: this shouldn't be too difficult. Copy the idea from widgets that already do it properly, for instance, Discretize and Edit Domain. The latter is better, because it stores changes immediately, not through storeSpecificSettings.