Rule does not ignore certain cases of URLs/paths

IQTLabs / textlint-rule-one-sentence-per-line

A fixable textlint rule to add a line break after each sentence

Apache License 2.0

1 stars 3 forks source link

Rule does not ignore certain cases of URLs/paths #3

Closed mghill closed 3 years ago

mghill commented 3 years ago

The example I came across was in a markdown doc, where textlint-rule-one-sentence-per-line was finding issue in the following snippet:

The Backoffice application will be available at:
https://{Application Load Balancer DNS NAME}/login?continue=/backoffice

It wanted to split on the ? character in the example URL. Further testing of normal prose in code spans also showed it was not skipped.

I would expect a rule to ignore things that are addresses or paths, and to ignore things in code spans. This could either be added as a default, or a settings option could be supported in .textlintrc either to toggle this behaviour or in a general case to permit pattern matched skipping.

Benjamin-Lee commented 3 years ago

Good catch! Let me think about how to fix that

Benjamin-Lee commented 3 years ago

I'm not going to lie: I'm not sure how to fix this. I think code spans would have to be detected, removed prior to sentence splitting, and then reinserted when fixing. Maybe @azu can take a look, since he wrote both sentence-splitter and textlint?

azu commented 3 years ago

https://sentence-splitter.netlify.app/#%23The%20Backoffice%20application%20will%20be%20available%20at%3A%0Ahttps%3A%2F%2Fexample.com%2Flogin%3Fcontinue%3D%2Fbackoffice%0A

I confirm that sentence-splitter split ? in the URL string. It seems sentence-splitter's a bug, but it may be difficult that support all URL-like strings. Can you create an issue to https://github.com/azu/sentence-splitter ?

Workaround: use code `URL` or link <URL>

The Backoffice application will be available at: https://{Application Load Balancer DNS NAME}/login?continue=/backoffice

📝 Actually https://{Application Load Balancer DNS NAME}/ is not URL. so it is reasonable that wrap it as code or link, or some tag.

mghill commented 3 years ago

@azu in the example, I had already used the code URL workaround & that also did not work. We will go forward with wrapping it in tags, but I still think it is a bug, right?

azu commented 3 years ago

I still think it is a bug, right?

Ah, I figure out.

textlint-rule-one-sentence-per-line should use splitAST instead of split. splitAST treat code as code node and it does not split the code?code.

https://github.com/IQTLabs/textlint-rule-one-sentence-per-line/blob/7db33c0565908f30adc405793e1fabb5852d5d3b/src/index.js#L46-L47

      const text = getSource(node) // <= convert node to plain text
      const split = splitter.split(text) // <= create sentences from text

→

      const split = splitter.splitAST(node) // <= create sentences from TxtNode

mghill commented 3 years ago

Sorry, put this down when it wasn't working - just came back to it. I absolutely do not have JS/TS skill, but fiddled with it in my local env and can kinda hack a work with another line, in /lib/index.js:

var temp_split = splitter.splitAST(node);
var split = temp_split.children; // if there is only one sentence in the line, return early it's following sembr

There was an issue with the type returned by splitAST. There's probably a better solution (rather than a first-glance one) and I don't want to mess up a well written JS project with a fairly blind/uninformed PR. Your advice helped, thank you.