TutteInstitute / vectorizers

Vectorizers for a range of different data types
BSD 3-Clause "New" or "Revised" License
93 stars 23 forks source link

InformationWeightTransform hates empty columns #101

Open jc-healy opened 2 years ago

jc-healy commented 2 years ago

We seem to crash kernels when we pass a sparse matrix with an empty column to InformationWeight transformers fit transform.

It comes up when we've got a fixed token dictionary but our training data is missing some of our vocabulary:

vect = NgramVectorizer(token_dictionary=token_dictionary).fit_transform(eventid_list)
vect1 = InformationWeightTransformer().fit_transform(vect)