MilaNLProc / contextualized-topic-models

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
MIT License
1.19k stars 143 forks source link

Hugging Face Model for Embedding #54

Closed dmolina2 closed 3 years ago

dmolina2 commented 3 years ago

Description

Hey guys... I'm trying to use CTM's for Topic Modeling answers of a survey. This texts are in spanish so I want to use a spanish pre trained HuggingFace Model as it says in the repository: "In general, our package should be able to support all the models described in the sentence transformer package and in HuggingFace."

Could you give an example how to export and use for embedding an HugginFace model like, for example:

https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased

It'd incredible if I can export this model, since it works very good in other NLP tasks.

Thanks!

vinid commented 3 years ago

Hello @dmolina2! :)

Yes, you can! and you can have this in both CombinedTM and ZeroShotTM.

The only thing you need to do is to use the Spanish model when you create the embeddings:

qt = TopicModelDataPreparation("dccuchile/bert-base-spanish-wwm-uncased")

You can refer to this link to check how these embeddings are created!

dmolina2 commented 3 years ago

Thanks! Very clear!