It looks like setting ngram_size to anything greater than 1 computes the column dictionary correctly but fails to fill any values into the _train_matrix. Current unit tests are not testing for this so we'll need to update them after fixing the error.
NgramVectorizer(ngram_size=1).fit_transform(data1)
<50x3210 sparse matrix of type '<class 'numpy.float32'>'
with 7685 stored elements in Compressed Sparse Row format>
NgramVectorizer(ngram_size=2).fit_transform(data1)
<50x11659 sparse matrix of type '<class 'numpy.float32'>'
with 0 stored elements in Compressed Sparse Row format>
It looks like setting ngram_size to anything greater than 1 computes the column dictionary correctly but fails to fill any values into the
_train_matrix
. Current unit tests are not testing for this so we'll need to update them after fixing the error.