biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.87k stars 1.02k forks source link

Rank: suspiciously slow on some computers #3129

Closed ajdapretnar closed 6 years ago

ajdapretnar commented 6 years ago
Orange version

3.15.dev

Expected behavior

Rank works reasonably fast on computers with sufficient RAM.

Actual behavior

Rank is slow on some machines. It worked fine on a PC with 2GB RAM, while it took forever on two with 8GB RAM. All machines are Windows, but it is probably not OS dependent.

Steps to reproduce the behavior

Try to rank blood-loneliness.tab with 19335 features.

Additional info (worksheets, data, screenshots, ...)
thocevar commented 6 years ago

This dataset has continuous features and a discrete class. The only "out of the box" feature scoring method that is available is ANOVA. Other methods have to discretize the features behind the scene.

There's a problem with feature "ABHD11" - it is always nan (in fact, there's 860 such features). This could potentially trip up imputation, discretization or the feature scoring method.

We could adapt RemoveConstant for this purpose.