Open Nytelife26 opened 3 years ago
Good ideas, and I share many of your concerns. However, there are several distinct issues that you've bundled here in #1155 and they should be broken down into smaller issues that can be discussed and completed independently. Here are some possible standalone issues:
nword.py
check.cursing.nfl
check.Also, note that cursing.nfl
defaults to off, probably for exactly the reason that it produces far too many false alarms in its current state to be useful:
As a more general point, we've been wary of any checks that attempt to categorically ban words. The only time that's seemed like a good idea so far is for needless variants, where the determination has already been made for us that the word has no need.
they should be broken down into smaller issues that can be discussed and completed independently.
Strong suggestion, actually. Ultimately I just wanted to put this down as an RFC to get people's thoughts prior to doing any real work.
but there may be some vestiges of an earlier organizational scheme.
I believe so, as that's what we saw with the split between dfw.uncomparables
(which didn't exist) and uncomparables.misc
. I'll check through them if a cleanup like this does occur.
Determining the right categorization scheme for checks and groups of related checks
That would definitely be the right thing to do going forward I think. I foresee it making maintenance quite a lot easier, and will overall help people to understand the general scope of these checks better.
Improving the archaism check to distinguish archaic vs. modern senses of the same character string.
That likely falls under the same problem we discussed relating to flag-based parsing honestly.
Improving or perhaps deleting [
cursing.nword
andcursing.nfl
].
I would suggest improving them rather than deleting altogether. Principally speaking, many of these things are genuinely words that should be avoided in most contexts, and if we can tighten the error margin much more and make them more definitive, they may very well be suitable for our usage.
Crafting a principled approach to determining what makes a check dubious or misguided and applying that approach consistently across all of proselint, both retrospectively and going forward, perhaps defining it in a policy document.
I would be more than happy to do this. Ultimately it would be good to concretely define and lay out our process for making these decisions and the criteria required for linguistic constructs. For part of this, we could use something similar to my language suitability evaluation framework
Making sure that all the messages are informative.
That would be quite an easy fix, too. Perhaps one best placed in the same restructure as a categorization evaluation.
cursing.nfl
defaults to off
I wasn't aware of that, actually - thanks for the tip. I'll be sure to consider .proselintrc
and our ability to set defaults in future.
we've been wary of any checks that attempt to categorically ban words
That's for the best, things like that can get authoritative or out of hand quite quickly. It's nice to see these things taken as seriously as they should be. It'll be easier for us to make those decisions once a framework is in place.
@Nytelife26 Thanks for the response, we're on the same page on every point :)
It has come to my attention that a lot of checks within proselint are dubious at best, or misguided. For instance:
cursing.nfl
check - some of them are just numbers, and others have many variations included, almost like a poorly-designed censoring system, in contrast with using regex.Et cetera. I feel it may be necessary to do a refactor of these checks and categorizations with a formal review to make maintainability easier in future and also to maintain a better linguistic ecosystem.