We only want to filter n-grams that envelop a pause, keeping the ones, that contain a pause but do not follow up with letters after that.
The previous implementation was almost perfect. The only issue I fixed in this PR is that patterns like "\n \n \n" or "{letter} \n \n" used to get filtered and now don't, because they don't contain anything after the beginning of the pause.
Same idea as the "." / "," pull request.
We only want to filter n-grams that envelop a pause, keeping the ones, that contain a pause but do not follow up with letters after that.
The previous implementation was almost perfect. The only issue I fixed in this PR is that patterns like "
\n \n \n
" or "{letter} \n \n
" used to get filtered and now don't, because they don't contain anything after the beginning of the pause.I also updated some comments.