biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.85k stars 1.01k forks source link

K-Means widget hangs (intermittent, multiple number of clusters) #6855

Open stuart-cls opened 3 months ago

stuart-cls commented 3 months ago

What's wrong? The k-means widget sometimes hangs while running the multiple k fitting option.

The widget simply sits in the processing progress status, and will never recover from this state: Orange must be quit.

This happens on medium size datasets (~10k rows, 10-250 features) and larger. Clustering is generally quite fast for these datasets unless this bug happens.

Once the bug has been triggered, it persists for the user session (survives quit / restart of Orange) and most times can only be fixed with system reboot (or perhaps user logout).

I've connected a py-spy profiler to the hung application and dumped the following (slightly redacted) call trace:

hung-k-means_edited.txt

How can we reproduce the problem?

We haven't been able to create an .ows file that reliably reproduces this problem, despite quite some effort.

What's your environment?

ales-erjavec commented 3 months ago

One of the libraries, mentioned in the profiler, threadpoolctl has this warning about libomp (more here).

Do you see that warning in the terminal when running quasar?

Can you post full conda info and conda list (or pip list) of the environments?

stuart-cls commented 3 months ago

Thanks for the pointer.

I don't see it in my environment (Debian 12, quasar-pip.txt ). I've asked my colleagues to watch for it in theirs when they see the bug. I don't have access to the conda environment where it was observed at the moment.