MaartenGr / KeyBERT

Minimal keyword extraction with BERT
https://MaartenGr.github.io/KeyBERT/
MIT License
3.31k stars 337 forks source link

Keywords are all lowercase #184

Closed secsilm closed 8 months ago

secsilm commented 8 months ago

Hi, great project first.

I would prefer to keep the original form instead of all lowercase. How can I do that? Thanks.

MaartenGr commented 8 months ago

That is a result of the underlying tokenizer. You can find more about that here.

secsilm commented 8 months ago

That is a result of the underlying tokenizer. You can find more about that here.

OK, I found it. It's a CountVectorizer config. Thanks.