Open jakepoz opened 6 years ago
Yes, I agree - sentences in quotes that end in an exclamation or question mark and that are followed by so-called speech tag should never be split, even if the speech tag begins with a proper noun, as in your two examples with Harry that segtok currently oversplits. On Mon, Aug 13, 2018, at 19:28, Jake Poznanski wrote:
We are seeing a few issues with segtok being over-eager to split quoted> sentences with names directly after the quoted section.
Ex.
"Good morning," said Harry. "Good morning?" asked Harry. "Good morning!" exclaimed Harry.
All of those split correctly. However consider the following variations:
"Good morning," Harry said. -> Correct "Good morning?" Harry asked. -> Splits into two. "Good morning!" Harry exclaimed. -> Splits into two.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/fnl/segtok/issues/16
We are seeing a few issues with segtok being over-eager to split quoted sentences with names directly after the quoted section.
Ex.
"Good morning," said Harry. "Good morning?" asked Harry. "Good morning!" exclaimed Harry.
All of those split correctly. However consider the following variations:
"Good morning," Harry said. -> Correct "Good morning?" Harry asked. -> Splits into two. "Good morning!" Harry exclaimed. -> Splits into two.