PrincetonUniversity / blocklint

MIT License
7 stars 7 forks source link

Expand the default block-list #19

Open JakeSummers opened 10 months ago

JakeSummers commented 10 months ago

Good Morning!

This is a pretty nifty package. I would be interested in starting to use it.

One current limitation of this tool is that the default block-list is pretty limited:

https://github.com/PrincetonUniversity/blocklint/blob/386b45c72150c41a16f0c14c202191120a0d753e/blocklint/main.py#L71-L74

This tool would be significantly more useful if it came packaged with a more extensive block-list. Right now, I need to make the block-list and get it code-reviewed (which I anticipate will be difficult).

In the readme, this alexjs is cited as inspiration:

https://github.com/PrincetonUniversity/blocklint/blob/386b45c72150c41a16f0c14c202191120a0d753e/README.md?plain=1#L8

I did a quick look and it seems like alexjs comes with a very comprehensive block-list via the retext-equality npm package. The full block-list is here: https://github.com/retextjs/retext-equality/tree/main/data/en

They also provide acceptable alternatives (with sources :) ) so that you can create output like this:

example.md
   1:5-1:14  warning  `boogeyman` may be insensitive, use `boogeymonster` instead                boogeyman-boogeywoman  retext-equality
  1:42-1:48  warning  `master` / `slaves` may be insensitive, use `primary` / `replica` instead  master-slave           retext-equality
  1:69-1:75  warning  Don’t use `slaves`, it’s profane                                           slaves                 retext-profanities
  2:52-2:54  warning  `he` may be insensitive, use `they`, `it` instead                          he-she                 retext-equality
  2:61-2:68  warning  `cripple` may be insensitive, use `person with a limp` instead             gimp                   retext-equality

⚠ 5 warnings

Source

It would be awesome if we could do the following:

  1. Copy the data from https://github.com/retextjs/retext-equality/tree/main/data/en into this repo
  2. Use that as the default block-list
  3. Add support for suggesting alternatives.
troycomi commented 10 months ago

It's a good point but I have a few reservations. I almost purposefully made this unopinionated so others could customize as needed. Adding an alternative may be within scope, though a larger change. Here are my concerns with the full alexjs list:

Overall I think including all the inconsiderate words would add bloat for checking source code specifically. Someone who uses slurs in their code probably won't care if this tool complains. But legacy usage of something like blacklist or master is what I wanted to mostly catch. For markdown, I'd also run alexjs to catch offensive phrases and language.

Here's what I'd propose.

  1. make a blocklint config file that includes all single-word entries in the alexjs database
  2. check if linting with that list drastically increases runtime (it may, my regexes are fairly complex)
  3. add a --strict switch which will use the strict config file

So users have the option to specify the strict switch or copy the file from github and modify as they see fit. I'd be open to a PR for adding a reason, but that would require a lot of rewrites.