MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.15k stars 764 forks source link

BERTopic (Can't retrieve unregistered extension attribute 'trf_data'. Did you forget to call the set_extension method?) #1551

Open FranValero97 opened 1 year ago

FranValero97 commented 1 year ago

Good morning, this is my code obtained from the following page: https://spacy.io/universe/project/bertopic after running it I get the following error: Can't retrieve unregistered extension attribute 'trf_data'. Did you forget to call the set_extension method?

How can I solve this error?

Instalación de las bibliotecas necesarias !pip install spacy !pip install bertopic !pip install scikit-learn

Descargar el modelo de spaCy en inglés (medium) !python -m spacy download en_core_web_md

Cargar las bibliotecas y el modelo import spacy from bertopic import BERTopic from sklearn.datasets import fetch_20newsgroups

Cargar los documentos de la base de datos de 20 Newsgroups docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']

Cargar el modelo de spaCy en inglés (medium) excluyendo componentes innecesarios nlp = spacy.load('en_core_web_md', exclude=['tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer'])

Crear el modelo BERTopic con spaCy topic_model = BERTopic(embedding_model=nlp) topics, probs = topic_model.fit_transform(docs)

I have tried changing the version of spacy to one that is between version 3.3.0 and version 3.4.0, I still get the same error trying all of them spacy models (sm, md, lg, trf)

MaartenGr commented 1 year ago

Which version of BERTopic are you currently using? Also, did you try installing BERTopic from the main branch? I believe there was a fix a while back for this. Also, although spaCy is supported as an embedding model, it is not something I would generally recommend. The models here are generally advised.