Open benlk opened 8 years ago
Other things not detected:
.”
with a curly double quote (and single). words
without a normal spaceNote that .
is detected, because it matches /\.\s+/
Regex should check that letter after whitespace is uppercase. That's a good indication of the start of a sentence, in English at least.
This should probably include tests for largo_trim_sentences
.
And at the end of this project, we might write a "Things programmers assume about sentences" post, which should include:
https://github.com/INN/Largo/blob/2db3552e3a523e44d92f64bb88ceaa173d48a26c/inc/post-tags.php#L520-L564
The text
.
, period non-breaking-space space, can occur when users insert two spaces after a period at the end of a sentence.If Largo is trying to determine an excerpt of
n
sentences long, the period-space-space will not be detected as the end of a sentence. Here's a five-sentence-long 2-'sentence' excerpt: