biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.85k stars 1.01k forks source link

Heat Map ignores the order given in the domain #4529

Closed ajdapretnar closed 4 years ago

ajdapretnar commented 4 years ago

Describe the bug

Heat Map ignores the order given in the domain.

To Reproduce

  1. File (heart-disease)
  2. Pivot table (rows=gender,columns=chest pain)
  3. Edit Domain (place female to be under male for gender)
  4. Heat Map (label with gender, female is first)

Orange version: master

Expected behavior Order of a categorical variable should be the same as given in the domain.

Screenshots

Screen Shot 2020-03-13 at 13 01 43
ajdapretnar commented 4 years ago

So after some exploration, the heat map apparently takes the order the data come in. But in case of discrete variables this is not always desired (as is the case of my months). I assumed Edit Domain fixes this, but it does not. Am I missing something? Is this a bug or the way it should work and we just have no widget in Orange to change the order of rows?

janezd commented 4 years ago

No, it sorts them alphabetically, which is less random than the order of appearance.

We could do the same trick as macOS Finder does with some file names: if all values of discrete attributes look numeric, we could sort them as numbers. On the first glance, we'd modify this: https://github.com/biolab/orange3/blob/master/Orange/data/io_base.py#L281. Should we?

As for Edit domain, reordering of values works for me. Even without checking "Ordered"; I don't think this property is observed anywhere in Orange.

ajdapretnar commented 4 years ago

No, it doesn't sort them alphabetically, but in order of appearance (in my case for sure). I have switched the order in the original data and then it worked. Otherwise, Edit Domain would work and it doesn't.

It does sort them alphabetically at the beginning, when loading the file, which becomes evident in Pivot Table and so on. Edit Domain does help with this one - it returns the order the user wanted.

I often find wanting a simple 'order as numbers' option or a default. Otherwise I have to click hundreds of time to reorganize, say, hours of the day (1, 10, 11, 12.... 2, 20, 21... in my case). One cannot drag them around in Edit Domain, each switch is done once for one variable.

VesnaT commented 4 years ago

The Heat Map displays rows in a "dataset order". I guess the issue could be fixed by #4644.

janezd commented 4 years ago

Rather "hackishly circumvented", not fixed. The user would then have to insert a Table to set the order to be used in the Heat map. I guess it would be better if the values just followed their defined orders -- we even have Edit Domain to set such an order.

VesnaT commented 4 years ago

Here I disagree. Since when should values of discrete variable define the instance order? Would a user really expect that when inspecting a dataset in the heat map?

ajdapretnar commented 4 years ago

We talked with @VesnaT and we agreed. Instance order is instance order. Labels are related to Domain only conceptually. Each row comes from the actual data table and is given its label, regardless of the order of the domain.

Anyhow, this 'bug' essentially stems from the fact that categorical values are sorted alphabetically by default. The data I was looking at came from Pivot Table, which uses value order as defined in the domain, hence the issue. I think if we implement #4778, there would be no such issue. Unless the order of the original data is such. Tough luck then.

janezd commented 4 years ago

OK. :)