TutteInstitute / vectorizers

Vectorizers for a range of different data types
BSD 3-Clause "New" or "Revised" License
93 stars 23 forks source link

Added a masking token for pruned vocabulary #47

Closed cjweir closed 4 years ago

cjweir commented 4 years ago

It’s for both the Labelled Trees and Token Cooccurrence vectorizers.

lmcinnes commented 4 years ago

I think it would be good to let a user pass a mask string. If the mask string is None then no masking is done, otherwise we use the string passed as the mask string. This lets the user pick out what they want to use.

cjweir commented 4 years ago

Good idea. I did that.

lmcinnes commented 4 years ago

I think you need to push your changes. Alternatively I have a PR I can put it for what I did. It doesn't cover the label case yet though, but I could fix that.