Open acwooding opened 4 years ago
Hi Amy, I think that makes good sense. It would probably be easiest to code (though not most efficient) by just setting all the non-zero values to 1 at the end of the function when that option is selected. We'll happily accept a PR doing something like that if you are keen. Otherwise I'll push it onto my to do stack (warning it will be fairly low and I'm not working on this project over the summer so I'm thinking September for me).
Wondering your thoughts on something like the Sklearn CountVectorizer option of
binary=True
for easy use in cases when counts don't matter...I know it's easy to do after the fact, but it could be a nice option to have where it makes sense (likeNgramVectorizer
). (Maybe it already exists, but I didn't see it). It would make it free to use as part of DocVectorizer as well. Otherwise I think you have to break the whole pipeline apart and "do it by hand" just to binarize a matrix.