This method is called by Learner._train_setup() and it checks that regression labels are not strings and that feature values (for both classification and regression) are not strings. However, this method does not work as expected if the featureset is read in as dense rather than sparse. Here's a minimal test case:
>>> from skll.data import NDJReader
>>> fs1 = NDJReader.for_path("examples/boston/train/example_boston_features.jsonlines", sparse=False).read()
>>> l1 = Learner('LinearRegression')
>>> fs2 = NDJReader.for_path("examples/iris/train/example_iris_features.jsonlines", sparse=False).read()
>>> l2 = Learner('LogisticRegression')
>>> l1.train(fs1, grid_search=False)
...
~/work/skll/skll/learner/__init__.py in _check_input_formatting(self, examples)
664 # make sure that feature values are not strings
665 # we need to check this for both sparse and dense arrays
--> 666 for val in examples.features.data:
667 if isinstance(val, str):
668 raise TypeError("You have feature values that are strings. "
NotImplementedError: multi-dimensional sub-views are not implemented
>>> l2.train(fs2, grid_search=False)
....
~/work/skll/skll/learner/__init__.py in _check_input_formatting(self, examples)
664 # make sure that feature values are not strings
665 # we need to check this for both sparse and dense arrays
--> 666 for val in examples.features.data:
667 if isinstance(val, str):
668 raise TypeError("You have feature values that are strings. "
NotImplementedError: multi-dimensional sub-views are not implemented
The solution is to explicitly reshape the dense feature array into a 1-dimensional array before iterating over .data attribute.
This method is called by
Learner._train_setup()
and it checks that regression labels are not strings and that feature values (for both classification and regression) are not strings. However, this method does not work as expected if the featureset is read in as dense rather than sparse. Here's a minimal test case:The solution is to explicitly reshape the dense feature array into a 1-dimensional array before iterating over
.data
attribute.