JuliaAI / MLJText.jl

A an MLJ extension for accessing models and tools related to text analysis
MIT License
11 stars 1 forks source link

change BagOfWordsTransformer to CountTransformer #20

Closed pazzo83 closed 2 years ago

pazzo83 commented 2 years ago

Changing the name of this transformer for more clarity. Essentially, all three of the transformers we have right now are based on the "bag of words" concept (TF-IDF and BM25 do additional weighting, but they are derived from the document-term matrix - DTM - which is just a count of each word in each document). Thus, one of the more basic forms of this is just the raw DTM which we can call the CountTransformer (in sklearn this is the CountVectorizer).

I think this would technically be a breaking change since we are changing the names of one of the models.