angeloskath / php-nlp-tools

Natural Language Processing Tools in PHP
Do What The F*ck You Want To Public License
743 stars 152 forks source link

Get the first sentence? #61

Closed LordPachelbel closed 6 years ago

LordPachelbel commented 6 years ago

I'm working on an events calendar, and for each event I need to automatically populate <meta name="description"> and <meta property="og:description"> tags from the event description text because the database doesn't have a field for users to enter meta data separately.

Rather than just truncate the text at an arbitrary number of characters, I would like to extract the first sentence from each description. Can this library be used to do that?

Because doing something like

$sentences = explode('.', $the_string);
$first_sentence = $sentences[0];

won't work for sentences that end with ! or ? or ?!, nor will it work if the first sentence contains things like Mr., Dr., i.e., e.g., etc.

angeloskath commented 6 years ago

Sure it can. You can see the tokenizers documentation for a crude example. You don't necessarily need to follow a similar rule based strategy (although for your problem I would recommend it), you can even train an NaiveBayes classifier to split the sentences.

Keep in mind that NlpTools is a library that provides tools to build your own solutions, this means that there exists no SentenceTokenizer by default.