Open Korb opened 3 months ago
I implemented this a long time ago (over 9 years ago), and don't recall the details.
I browsed the code to review the algorithm.
A sentence's importance is calculated by assigning a score for each word in the sentence, and summing the scores. A word's score is based on its frequency throughout the document (higher scores for higher frequency). The score of long sentences is reduced, to account for having a higher score from more words.
At the moment, the information provided is not enough to understand what exactly is meant by the wording "the important content", and based on what criteria this content will be searched for in the text of web pages.