biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.79k stars 997 forks source link

Error with coloring in Visualize/Scatter Plot when using certain set of data. #5580

Closed HannesLum closed 2 years ago

HannesLum commented 3 years ago

What's wrong? The coloring of the Scatter Plot will fail and color all classes in the same color, if a unlucky set of data is used. The problem is basically within Orange/widgets/utils/colorpalettes.py in line 409 inf from_palette(...) where, in order to get the left and right border of each coloring-bin, the bin-values are added together. When these values have the wrong data-type there can be an overload in the summation and suddenly the bin can have e.g. negative values (like in my case). Manually casting the bin-values to float will keep this from happening: mids = (np.array(bins).astype(float)[:-1] + np.array(bins).astype(float)[1:]) / 2

Another thing to note here is, that this error also leads to another problem, which can be seen in Line Chart from the Timeseries-addon. Line Chart will be unable to load the data as can be seen in debugging_owlinechart_owscatterplot.zip because there exists a faulty Context-Class in the owlinechart-class. This faulty Context has a __dict__ as in owlinechart_exception.txt and leads to the exception as writen in the file. The exception is raised because "attrs" in Context.__dict__ is [[]] which means len(value) also becomes 0. I was unable to determine where the origin of this error is coming from or if Context is faulty at all. The problem can be solved by adding the following code right in front of the statement causing the exception (line 441 in orange3-timeseries-master/orangecontrib/timeseries/widgets/owlinechart.py):

if len(value) == 0:
    return self.NO_MATCH

How can we reproduce the problem? Using the data in debugging_owlinechart_owscatterplot_reduced.xlsx and loading in the orange file debugging_owlinechart_owscatterplot.zip will initialize the widgets so that the errors as described above can be seen.

What's your environment?

janezd commented 3 years ago

Thank you for this very detailed report - and especially for doing the debugging yourself.

As I understand, the problem is that bins is stored as 32-bit integers? This doesn't happen on mac; @thocevar can you try this on Windows?

HannesLum commented 3 years ago

Of course, if I don't debug it, I wouldn't know if I made a mistake. :) And yes. The relevant datatype is numpy.int32.

thocevar commented 2 years ago

I can replicate the first problem. It is caused by time data because it is stored as the number of seconds since the Unix epoch and twice that number exceeds the 32-bit range. There is a simple fix for computing the average as a/2+b/2 instead of (a+b)/2. The data type comes from computing the bins using np.arange, which can default to np.int32 in some cases. We might have to address this problem of handling large integer values at some point.

I can't replicate the second problem. There have been some changes (https://github.com/biolab/orange3-timeseries/commit/1f56ce64274b352cb22835f8b6a3763d8a442b83) that affect the line you mention. However, they were not released on PyPI yet. Please try downloading the latest source code of the timeseries add-on and let us know if the problem persists.

janezd commented 2 years ago

@thocevar, @HannesLum: can one of you prepare a pull request that fixes the problem with colors on Windows? I'd do it myself, but I'm on macOS and it already works.

HannesLum commented 2 years ago

Right, the second problem was more of an artifact of the first one, so there's no real need to adress it, I think. I just wanted to mention it just in case and because I took some time trying to figure out what it was in the first place. Regarding the pull request, I would have to ask how to go about that. I cloned the repository, changed the line of code and what command do I have to enter now? Help would be much appreciated, as I usually just directly push my changes to my branch...

thocevar commented 2 years ago

You would need to fork the repository, clone it, make changes and push them to your fork. Afterwards, github should show you a "pull request" option to propose incorporating your changes into the repository it was forked from.

I can go ahead and make these minor fixes or wait for your contribution if you would like to go through the process.

HannesLum commented 2 years ago

Yes, I wanted to go through the process, thank you for the explanation! I hope everything is in order so far, but if not, I'll do what I can to fix it.

thocevar commented 2 years ago

Your pull request #5666 looks fine. You just need to accept the CLA agreement before we can merge your fix.

HannesLum commented 2 years ago

Oh right, i used the wrong author for commiting... my bad. My local configuration is a mess, it seems. The author I used couldn't accept the CLA as it wasn't a Github user. I had to redo the pull request. Sorry for the inconvenience, I still have a lot to learn when using git... but the second time it seems to have worked out somehow!