Open Kotik001 opened 5 months ago
The .get_document_info
function is only for the documents on which you fitted the data and currently do not work with documents that were not part of the fitting process. You would have to manually create a similar dataframe yourself.
Hello! I'm very thankful for your tool and library it's so much awesome!
I should apologise for my basic (stupid) question in advance, however i can't solve it.
First, i'm making model with .fit on pandas DF with 20293 rows (texts)
Then i make auto nr to 15 topics and reduce outliers with c-TF-IDF strategy.
I have resulting df with all document assigned to topics:
result=topic_model.get_document_info(data['filtered_words']) result_df = data.join(result, how='inner')
Then i save model to use it with new data:
topic_model.save("testmodel", serialization="pytorch", save_ctfidf=True, save_embedding_model=sentence_model)
So now i'm trying to get document info only for new data
And error
I figured it out that if i concat new data with old data and then use .get_document_info, it's ok except there's somehow one excess row which i delete from new data so that would not lead to mismatch error. Let me know if you need more info or my whole data/ code. Thanks!
piplist_toshare.txt