A flexible, smart word filter to prevent profanity or whatever whatever suits your taste.
npm i whoolso-word-filter
const { filterWords } = require('whoolso-word-filter');
This package gives you access to a function called filterWords()
which takes a config object as its sole argument. To keep it simple, the function returns an array with all
the words/phrases found inside a given string.
The config object allows you to configure the filter as much as possible. Take the following example:
const configObj = {
wordsToFilter,
stringToCheck,
lengthThreshold: 3,
leetAlphabet1: textToLeetAlphabet1,
leetAlphabet2: textToLeetAlphabet2,
shortWordLength: 3,
shortWordExceptions
};
const foundWords = filterWords(configObj);
Argument Definitions:
'bad person'
, you'll have to
write it without spaces 'badperson'
.'idiot'
, it
is not necessary to add 'idiots'
because it contains the root of the word, which is what interests us. It's not necessary to add the leet versions
of your word either (ex. '1d1ot'
). If you want to be really strict, it'd be a good idea to add misspelled versions of the word (ex. 'stupid'
could be intentionally misspelled as 'stupd'
).// Suppose we want to filter some political terms. Our wordsToFilter array could be something like this:
const wordsToFilter = [
'gop',
'gerrymander',
'republican',
'republikan',
'rpublican',
'rpblican',
'rpublicn',
'repblicn',
'lefty',
'lfty',
'lftwing'
];
Also, make sure all the words you add are lowercase. The filter converts the string you want to check to lowercase, so array of wordsToFilter must be all lowercase too.
const stringToCheck = `I am a political comment whose unique goal is to say the word republican.`;
lengthThreshold: The length of syllabes in which you want to find words separated by spaces. 'I am here to say the word r e p u b l i c a n'
will catch
'republican'
if the value of lengthThreshold
is at least 1. If it's 2, 're pu bl ic an'
would be caught as well and so on. The larger you set this option to, the more
prone you will be to false positives, so I wouldn't suggest using a number larger than 3, but that depends on your needs.
leetAlphabet1 and leetAlphabet2: The function will perform two leet translations in the text. Given the wordsToFilter array shown above, take these
sentences 'I am here to say the word republic@n'
and 'I am here to say the word republ1c4n'
, both of them will be caught as 'republican'.
leetAlphabet1 and leetAlphabet2 must have the following format:
const textToLeetAlphabet1 = {
A: '@',
B: '8',
C: '(',
D: 'D',
E: '3',
F: 'F',
G: '6',
H: '#',
I: '!',
J: 'J',
K: 'K',
L: '1',
M: 'M',
N: 'N',
O: '0',
P: 'P',
Q: 'Q',
R: 'R',
S: '$',
T: '6',
U: 'U',
V: 'V',
W: 'W',
X: 'X',
Y: 'Y',
Z: '2'
};
const textToLeetAlphabet2 = {
A: '4',
B: '8',
C: '(',
D: '<|',
E: '€',
F: 'PH',
G: '9',
H: '|-|',
I: '1',
J: 'J',
K: 'K',
L: '|',
M: '|\\/|',
N: '|\\|',
O: '0',
P: '|2',
Q: 'Q',
R: 'R',
S: '5',
T: '+',
U: '|_|',
V: '/',
W: '//',
X: '><',
Y: `'/`,
Z: '2'
};
If you require a more advanced leet filter, translate the string's leet before passing it inside the filter's config object.
// Setting this option to 3 means 'In my case, short words are those with a length of 3 or less characters.'
shortWordLength
, they must be treated as short words. 'meth' is a good example, say you want to
filter drug names, but a string containing the word 'something' returns 'meth'. You could set your shortWordLength to 4 to solve this, but that could cause some other
false positives. shortWordExceptions
is the solution for these cases:const shortWordExceptions = ['meth'];
Consider the following configObject
, where the leet alphabets are the same as the ones shown above.
const wordsToFilter = ['uneducated', 'republican', 'meth'];
const shortWordExceptions = ['meth'];
const configObj = {
wordsToFilter,
stringToCheck,
lengthThreshold: 3,
leetAlphabet1: textToLeetAlphabet1,
leetAlphabet2: textToLeetAlphabet2,
shortWordLength: 3,
shortWordExceptions
};
configObj.stringToCheck = `They are something else, uneducated republicans`;
console.log(filterWords(configObj)); // [ 'uneducated', 'republican' ]
configObj.stringToCheck = `They are something else, un$edu'ca[]\`te()d" republicans`;
console.log(filterWords(configObj)); // [ 'uneducated', 'republican' ]
It doesn't matter how many spaces you use, the word will be detected anyways.
configObj.stringToCheck = `They are something else, u n e d u c a t e d republicans`;
console.log(filterWords(configObj)); // [ 'uneducated', 'republican' ]
configObj.stringToCheck = `They are something else, UNEDUCATED republicans`;
console.log(filterWords(configObj)); // [ 'uneducated', 'republican' ]
Alternating between upper and lowercase gives the same result, ex. 'uNeDuCatEd'
.
configObj.stringToCheck = `They are something else, un3duc@t3d republicans`;
console.log(filterWords(configObj)); // [ 'uneducated', 'republican' ]
configObj.stringToCheck = `They are something else, uneeduuuucaa@t333ed republicans`;
console.log(filterWords(configObj)); // [ 'uneducated', 'republican' ]
configObj.stringToCheck = `They are something elseuneeeduucatedrepublicans`;
console.log(filterWords(configObj)); // [ 'uneducated', 'republican' ]
An empty array will be returned.
configObj.stringToCheck = `They are good people and I love them`;
console.log(filterWords(configObj)); // []
Note: If you want to filter words that have two consecutive letters (ex. 'dumbass'), it's advisable to add the
versions with only one letter to wordsToFilter
to make sure the filter is able to catch them (ex. 'dumbas').
This is because when checking for leet we remove the duplicated letters, so if someone writes something like
'hello dumba$$hoe'
, combining leet in a word with two consecutive letters and then adding another word without spaces
will be able to pass if you haven't added the version of the word with only one letter, 'dumbas' in this case.