errata-ai / Microsoft

A Vale-compatible implementation of the Microsoft Writing Style Guide.
https://github.com/errata-ai/vale
MIT License
85 stars 46 forks source link

OxfordComma missing support for multiword items #30

Open tcmetzger opened 4 years ago

tcmetzger commented 4 years ago

The current regex for detecting missing Oxford Commas seems to only detect sentences where the last item in the list before the "and"/"or" consists of one single word.

For example: "Save your file to a hard drive, an external drive or OneDrive." Vale seems to not detect the missing comma after 'drive' (example from https://docs.microsoft.com/en-us/style-guide/punctuation/commas). However, if the sentence is "Save your file to a hard drive, OneDrive or an external drive", Vale will correctly detect the missing comma.

tcmetzger commented 4 years ago

I would suggest adding whitespaces (\s) to the regex: '(?:[^,]+,){1,}\s[\w\s]+\s(and|or)\s'. Does that make sense? If so, I'd be happy to open a pull request with this ammendment!

in: '(?:[^,]+,){1,}\s[\w\s]+\s(and|or)\s' out: '(?:[^,]+,){1,}\s\w+\s(and|or)' (This would already include changes from #29.)

jdkato commented 4 years ago

While it's true that we currently only check for a specific case, the problem with trying to be more generic is that you'll get a lot of false positives.

I ran your suggested pattern on this file and saw these (among other) results:

This explicitness on your part, which is up to you to maintain with discipline, will save you lots of refactor headaches and footguns down the line.

Remember: one of the most important roles for source code is to communicate clearly, not only to you, but your future self and other code collaborators, what your intent is.

But again, other than our intuitions and sensibilities, there doesn't appear to be objective and clear measure of what constitutes "accidents" or prevention thereof.

This isn't to say that the rule can't be improved, but I think adding \s is too generic.

tcmetzger commented 4 years ago

I see your point, this regex could certainly use some more work. Are there any regex-experts on this project? I'll also keep tweaking and testing some more.