amperser / proselint

A linter for prose.
http://proselint.com
BSD 3-Clause "New" or "Revised" License
4.33k stars 178 forks source link

refactor: clean up checks #1155

Open Nytelife26 opened 3 years ago

Nytelife26 commented 3 years ago

It has come to my attention that a lot of checks within proselint are dubious at best, or misguided. For instance:

Et cetera. I feel it may be necessary to do a refactor of these checks and categorizations with a formal review to make maintainability easier in future and also to maintain a better linguistic ecosystem.

suchow commented 3 years ago

Good ideas, and I share many of your concerns. However, there are several distinct issues that you've bundled here in #1155 and they should be broken down into smaller issues that can be discussed and completed independently. Here are some possible standalone issues:

suchow commented 3 years ago

Also, note that cursing.nfl defaults to off, probably for exactly the reason that it produces far too many false alarms in its current state to be useful:

https://github.com/amperser/proselint/blob/372ebf0253ddbf3c404e2f44bf1519fd6510b6ce/proselint/.proselintrc#L13

As a more general point, we've been wary of any checks that attempt to categorically ban words. The only time that's seemed like a good idea so far is for needless variants, where the determination has already been made for us that the word has no need.

Nytelife26 commented 3 years ago

they should be broken down into smaller issues that can be discussed and completed independently.

Strong suggestion, actually. Ultimately I just wanted to put this down as an RFC to get people's thoughts prior to doing any real work.

but there may be some vestiges of an earlier organizational scheme.

I believe so, as that's what we saw with the split between dfw.uncomparables (which didn't exist) and uncomparables.misc. I'll check through them if a cleanup like this does occur.

Determining the right categorization scheme for checks and groups of related checks

That would definitely be the right thing to do going forward I think. I foresee it making maintenance quite a lot easier, and will overall help people to understand the general scope of these checks better.

Improving the archaism check to distinguish archaic vs. modern senses of the same character string.

That likely falls under the same problem we discussed relating to flag-based parsing honestly.

Improving or perhaps deleting [cursing.nword and cursing.nfl].

I would suggest improving them rather than deleting altogether. Principally speaking, many of these things are genuinely words that should be avoided in most contexts, and if we can tighten the error margin much more and make them more definitive, they may very well be suitable for our usage.

Crafting a principled approach to determining what makes a check dubious or misguided and applying that approach consistently across all of proselint, both retrospectively and going forward, perhaps defining it in a policy document.

I would be more than happy to do this. Ultimately it would be good to concretely define and lay out our process for making these decisions and the criteria required for linguistic constructs. For part of this, we could use something similar to my language suitability evaluation framework

Making sure that all the messages are informative.

That would be quite an easy fix, too. Perhaps one best placed in the same restructure as a categorization evaluation.

cursing.nfl defaults to off

I wasn't aware of that, actually - thanks for the tip. I'll be sure to consider .proselintrc and our ability to set defaults in future.

we've been wary of any checks that attempt to categorically ban words

That's for the best, things like that can get authoritative or out of hand quite quickly. It's nice to see these things taken as seriously as they should be. It'll be easier for us to make those decisions once a framework is in place.

suchow commented 3 years ago

@Nytelife26 Thanks for the response, we're on the same page on every point :)