biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.86k stars 1.01k forks source link

SVM and/or Preprocess with sparse data (BoW): no way to get rid of warning "Input data is sparse, default preprocessing is to scale it" #6870

Open wvdvegte opened 2 months ago

wvdvegte commented 2 months ago

What's wrong? When I'm trying to classify text processed with Bag-of-Words using SVM, the SVM dialog box shows a warning "Input data is sparse, default preprocessing is to scale it" and it won't perform classification. I would expect that Preprocess > Normalize Features > scale to σ^2 = 1 before SVM would do the trick to apply scaling to the sparse BoW data, but that produces the same warning in the SVM widget.

How can we reproduce the problem? Try to apply SVM to text processed as BoW together with a categorical variable based on which the text can be classified. Try to insert Preprocess with Normalize Features > scale to σ^2 = 1 before SVM

What's your environment?

processo commented 2 months ago

You are right. Preprocess scaling leaves the data sparse. You can use the same method (only one that works on sparse data) in Continuize as an alternative.

processo commented 1 month ago

I forgot to say that my workflow (Windows 10, Orange 3.37.0) does produce predictions despite the warning. So I could not reproduce that part of the problem.