dmmiller612 / bert-extractive-summarizer

Easy to use extractive text summarization with BERT
MIT License
1.38k stars 305 forks source link

PreProcessing #87

Closed pratikghanwat7 closed 2 years ago

pratikghanwat7 commented 3 years ago

Hello, Are you doing any kind of preprocessing on input text? such as stopwords removal, tokenization, lemmatize, or any other text cleaning process?

dmmiller612 commented 3 years ago

In the basic setup, I am not doing any preprocessing. In some of my research over a year ago, I looked into that with different spacy operations, but results were largely inconclusive (I also didn't spend a lot of time on it). With the current library, this could be done by a custom SentenceHandler.

kingabzpro commented 3 years ago

I have used your model, it works perfectly for small sentences but it kinda breaks with larger documents. I wanted to give you more insight but right now I am busy with a project and return to you with a detailed explanation later.