Closed frank0434 closed 3 years ago
maybe you have duplicate sentences? another approach might be to use a clustering algorithm (e.g. BTM / topicmodels) and apply textrank within each cluster
thanks for your reply. I did find some duplicated sentences. Tried subset the first 500 sentences, the minihash ran smoothly and textrank super quick. Guess I need to spend a bit more time on cleaning the data. close for now
Thanks for this great package.
Have a question about the
minihash
function intextrank_candidates_lsh
.I want to rank 56K+ sentences. Time cost seems unbearable if using
textrank_sentences
diecelty. so followed instruction in the viggette and tried to reduce the number of sentences. But seems the minihash generate duplicatedbucket hashes
, which cause the failure of themerge
function.Wondering if there is something I can try?
Thank you in advance for any feedback.