TutteInstitute / vectorizers

Vectorizers for a range of different data types
BSD 3-Clause "New" or "Revised" License
93 stars 23 forks source link

Possible Binary Option? #45

Open acwooding opened 4 years ago

acwooding commented 4 years ago

Wondering your thoughts on something like the Sklearn CountVectorizer option of binary=True for easy use in cases when counts don't matter...I know it's easy to do after the fact, but it could be a nice option to have where it makes sense (like NgramVectorizer). (Maybe it already exists, but I didn't see it). It would make it free to use as part of DocVectorizer as well. Otherwise I think you have to break the whole pipeline apart and "do it by hand" just to binarize a matrix.

jc-healy commented 3 years ago

Hi Amy, I think that makes good sense. It would probably be easiest to code (though not most efficient) by just setting all the non-zero values to 1 at the end of the function when that option is selected. We'll happily accept a PR doing something like that if you are keen. Otherwise I'll push it onto my to do stack (warning it will be fairly low and I'm not working on this project over the summer so I'm thinking September for me).