Closed ajdapretnar closed 5 years ago
@ajdapretnar, could you attach some simple data for easier replication/fixing? I, for instance, can't look into this because I don't know exactly how to prepare the data.
This should work. svm-fails.pkl.gz
Load the data with Corpus and pass it to SVM. If you use Select Columns and remove the first 7 attributes (those with normal names), SVM starts working. Potential issue could be that these weren't properly transformed to sparse, since these attributes were there before BoW construction.
Orange version
3.16.dev
Expected behavior
SVM works with election-tweets-2016.tab from Text add-on.
Actual behavior
SVM shows 'Fitting failed. Non-zero offset in normalization of sparse data.' This only happens when additional, non-bag-of-words attributes are present.
Steps to reproduce the behavior
Corpus - Data Table (select a couple of instances for performance) - Bag of Words (Count) - SVM.
Additional info (worksheets, data, screenshots, ...)
Try to remove original attributes (Favourite count, Retweet count, Latitude, Longitude, Is retweet and Language) and SVM starts working. I think the issue might be in different handling of attributes (from original data and from bag of words).
See https://github.com/biolab/orange3-text/issues/312