get-woke / woke

Detect non-inclusive language in your source code.
https://docs.getwoke.tech
MIT License
454 stars 61 forks source link

Community-driven rule list #67

Closed johnbent closed 3 years ago

johnbent commented 3 years ago

Is your feature request related to a problem? Please describe. We are looking for a single tool to scan our code for problematic language. There are at least three types of problematic language that I think can be scanned:

  1. Non inclusive terms a. Awesome. Seems like woke is perfect for this. b. Corporations should be able to share this effort both to maintain the evolving word list and suggested replacements as well as the scanning tools.
  2. Corporate terms a. Like the name of unannounced products that developers might be thinking about but shouldn't mention in comments b. Seems like woke is perfect for this since we can pass exact word lists to it
  3. Vulgarity and slurs a. Developers shouldn't but sometimes they put swear words in their comments b. I would hope it never happens but you could imagine people potentially putting slurs into their comments as well c. I suppose we could handle vulgarity and slurs like we handle corporate terms but this seems much better as a community initiative

Describe the solution you'd like A maintained word list for vulgarity and slurs.

Describe alternatives you've considered We could do it ourselves but we want to share this with the community and have confidence that we are using appropriate word lists.

Additional context It seems like a fair number of folks are building similar word lists. INI, inclusivelint. We should share a word list I think.

caitlinelfring commented 3 years ago

I've been considering a feature for woke that would allow inclusion of multiple "lists" that would allow you to "extend" your ruleset without including them by default. They could be included in woke, or maintained by the community on GitHub/remotely. Something along the lines of

extends:
  - default
  - https://github.com/get-woke/rulesets.git/vulgarity.yaml
  - file:///home/me/myruleset.yaml

I came across https://github.com/hashicorp/go-getter the other day which would be interesting to use, but I haven't had the chance to dig into it much. This would give the ability for the community to maintain lists without forcing rules on all users. Thanks for bringing this up!

johnbent commented 3 years ago

https://github.com/retextjs/retext-equality/tree/main/data/en is by far the largest word lists I've found publicly maintained.

caitlinelfring commented 3 years ago

Closing in favor of #104