Lietsaki / whoolso-word-filter

A flexible, smart word filter to prevent profanity or whatever whatever suits your taste.
1 stars 0 forks source link

whoolso-word-filter

A flexible, smart word filter to prevent profanity or whatever whatever suits your taste.

Installing

npm i whoolso-word-filter

const { filterWords } = require('whoolso-word-filter');

Using the filter

This package gives you access to a function called filterWords() which takes a config object as its sole argument. To keep it simple, the function returns an array with all the words/phrases found inside a given string.

Building the config object

The config object allows you to configure the filter as much as possible. Take the following example:

const configObj = {
  wordsToFilter,
  stringToCheck,
  lengthThreshold: 3,
  leetAlphabet1: textToLeetAlphabet1,
  leetAlphabet2: textToLeetAlphabet2,
  shortWordLength: 3,
  shortWordExceptions
};

const foundWords = filterWords(configObj);

Argument Definitions:

// Suppose we want to filter some political terms. Our wordsToFilter array could be something like this:

const wordsToFilter = [
  'gop',
  'gerrymander',
  'republican',
  'republikan',
  'rpublican',
  'rpblican',
  'rpublicn',
  'repblicn',
  'lefty',
  'lfty',
  'lftwing'
];

Also, make sure all the words you add are lowercase. The filter converts the string you want to check to lowercase, so array of wordsToFilter must be all lowercase too.

const stringToCheck = `I am a political comment whose unique goal is to say the word republican.`;

leetAlphabet1 and leetAlphabet2 must have the following format:

const textToLeetAlphabet1 = {
  A: '@',
  B: '8',
  C: '(',
  D: 'D',
  E: '3',
  F: 'F',
  G: '6',
  H: '#',
  I: '!',
  J: 'J',
  K: 'K',
  L: '1',
  M: 'M',
  N: 'N',
  O: '0',
  P: 'P',
  Q: 'Q',
  R: 'R',
  S: '$',
  T: '6',
  U: 'U',
  V: 'V',
  W: 'W',
  X: 'X',
  Y: 'Y',
  Z: '2'
};

const textToLeetAlphabet2 = {
  A: '4',
  B: '8',
  C: '(',
  D: '<|',
  E: '€',
  F: 'PH',
  G: '9',
  H: '|-|',
  I: '1',
  J: 'J',
  K: 'K',
  L: '|',
  M: '|\\/|',
  N: '|\\|',
  O: '0',
  P: '|2',
  Q: 'Q',
  R: 'R',
  S: '5',
  T: '+',
  U: '|_|',
  V: '/',
  W: '//',
  X: '><',
  Y: `'/`,
  Z: '2'
};

If you require a more advanced leet filter, translate the string's leet before passing it inside the filter's config object.

// Setting this option to 3 means 'In my case, short words are those with a length of 3 or less characters.'

const shortWordExceptions = ['meth'];

Examples

Consider the following configObject, where the leet alphabets are the same as the ones shown above.

const wordsToFilter = ['uneducated', 'republican', 'meth'];
const shortWordExceptions = ['meth'];

const configObj = {
  wordsToFilter,
  stringToCheck,
  lengthThreshold: 3,
  leetAlphabet1: textToLeetAlphabet1,
  leetAlphabet2: textToLeetAlphabet2,
  shortWordLength: 3,
  shortWordExceptions
};
configObj.stringToCheck = `They are something else, uneducated republicans`;

console.log(filterWords(configObj)); // [ 'uneducated', 'republican' ]
configObj.stringToCheck = `They are something else, un$edu'ca[]\`te()d" republicans`;

console.log(filterWords(configObj)); // [ 'uneducated', 'republican' ]

It doesn't matter how many spaces you use, the word will be detected anyways.

configObj.stringToCheck = `They are something else, u n e d u c a t e d republicans`;

console.log(filterWords(configObj)); // [ 'uneducated', 'republican' ]
configObj.stringToCheck = `They are something else, UNEDUCATED republicans`;

console.log(filterWords(configObj)); // [ 'uneducated', 'republican' ]

Alternating between upper and lowercase gives the same result, ex. 'uNeDuCatEd'.

configObj.stringToCheck = `They are something else, un3duc@t3d republicans`;

console.log(filterWords(configObj)); // [ 'uneducated', 'republican' ]
configObj.stringToCheck = `They are something else, uneeduuuucaa@t333ed republicans`;

console.log(filterWords(configObj)); // [ 'uneducated', 'republican' ]
configObj.stringToCheck = `They are something elseuneeeduucatedrepublicans`;

console.log(filterWords(configObj)); // [ 'uneducated', 'republican' ]

An empty array will be returned.

configObj.stringToCheck = `They are good people and I love them`;

console.log(filterWords(configObj)); // []

Note: If you want to filter words that have two consecutive letters (ex. 'dumbass'), it's advisable to add the versions with only one letter to wordsToFilter to make sure the filter is able to catch them (ex. 'dumbas'). This is because when checking for leet we remove the duplicated letters, so if someone writes something like 'hello dumba$$hoe', combining leet in a word with two consecutive letters and then adding another word without spaces will be able to pass if you haven't added the version of the word with only one letter, 'dumbas' in this case.