biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis
https://orangedatamining.com
Other
4.85k stars 1.01k forks source link

SVM fails when mixing original and BoW attributes #3202

Closed ajdapretnar closed 5 years ago

ajdapretnar commented 6 years ago
Orange version

3.16.dev

Expected behavior

SVM works with election-tweets-2016.tab from Text add-on.

Actual behavior

SVM shows 'Fitting failed. Non-zero offset in normalization of sparse data.' This only happens when additional, non-bag-of-words attributes are present.

Steps to reproduce the behavior

Corpus - Data Table (select a couple of instances for performance) - Bag of Words (Count) - SVM.

Additional info (worksheets, data, screenshots, ...)

Try to remove original attributes (Favourite count, Retweet count, Latitude, Longitude, Is retweet and Language) and SVM starts working. I think the issue might be in different handling of attributes (from original data and from bag of words).

See https://github.com/biolab/orange3-text/issues/312

janezd commented 5 years ago

@ajdapretnar, could you attach some simple data for easier replication/fixing? I, for instance, can't look into this because I don't know exactly how to prepare the data.

ajdapretnar commented 5 years ago

This should work. svm-fails.pkl.gz

Load the data with Corpus and pass it to SVM. If you use Select Columns and remove the first 7 attributes (those with normal names), SVM starts working. Potential issue could be that these weren't properly transformed to sparse, since these attributes were there before BoW construction.

ajdapretnar commented 5 years ago
screen shot 2019-03-01 at 09 42 31