Closed mghill closed 3 years ago
Good catch! Let me think about how to fix that
I'm not going to lie: I'm not sure how to fix this. I think code spans would have to be detected, removed prior to sentence splitting, and then reinserted when fixing. Maybe @azu can take a look, since he wrote both sentence-splitter and textlint?
I confirm that sentence-splitter split ?
in the URL string.
It seems sentence-splitter's a bug, but it may be difficult that support all URL-like strings.
Can you create an issue to https://github.com/azu/sentence-splitter ?
Workaround: use code `URL`
or link <URL>
The Backoffice application will be available at:
https://{Application Load Balancer DNS NAME}/login?continue=/backoffice
📝 Actually https://{Application Load Balancer DNS NAME}/
is not URL. so it is reasonable that wrap it as code or link, or some tag.
@azu in the example, I had already used the code URL
workaround & that also did not work. We will go forward with wrapping it in tags, but I still think it is a bug, right?
I still think it is a bug, right?
Ah, I figure out.
textlint-rule-one-sentence-per-line should use splitAST
instead of split
.
splitAST
treat code
as code node and it does not split the code?code
.
const text = getSource(node) // <= convert node to plain text
const split = splitter.split(text) // <= create sentences from text
→
const split = splitter.splitAST(node) // <= create sentences from TxtNode
Sorry, put this down when it wasn't working - just came back to it.
I absolutely do not have JS/TS skill, but fiddled with it in my local env and can kinda hack a work with another line, in /lib/index.js
:
var temp_split = splitter.splitAST(node);
var split = temp_split.children; // if there is only one sentence in the line, return early it's following sembr
There was an issue with the type returned by splitAST. There's probably a better solution (rather than a first-glance one) and I don't want to mess up a well written JS project with a fairly blind/uninformed PR. Your advice helped, thank you.
The example I came across was in a markdown doc, where textlint-rule-one-sentence-per-line was finding issue in the following snippet:
It wanted to split on the
?
character in the example URL. Further testing of normal prose in code spans also showed it was not skipped.I would expect a rule to ignore things that are addresses or paths, and to ignore things in
code spans
. This could either be added as a default, or a settings option could be supported in .textlintrc either to toggle this behaviour or in a general case to permit pattern matched skipping.