TwiN / go-away

Library for detecting profanities in Go
MIT License
177 stars 43 forks source link

Filter only the exact words #38

Open karthikrajanuber opened 2 years ago

karthikrajanuber commented 2 years ago

Describe the bug

We need to filter out only the exact words provided in the profanities list. Eg: fr -> con is a profane word. It should be filter if it is an exact word. 'contactes' should not be filtered.

Is there a way can do this? I don't want to use falsepositvies because we are maintaining multiple languages.

Please suggest

What do you see?

No response

What do you expect to see?

No response

List the steps that must be taken to reproduce this issue

We need to filter out only the exact words provided in the profanities list. Eg: fr -> con is a profane word. It should be filter if it is an exact word. 'contactes' should not be filtered.

Is there a way can do this? I don't want to use falsepositvies because we are maintaining multiple languages.

Please suggest

Version

No response

Additional information

No response

TwiN commented 2 years ago

Have you tried using WithSanitizeSpaces(false)?

karthikrajanuber commented 2 years ago

Hi @TwiN Yes I tried using WithSanitizeSapces(false) -> it removes the spaces in the the string. But still it is not working.

Screenshot 2022-08-24 at 12 55 17 PM

Screenshot 2022-08-24 at 12 54 13 PM

I tried modifying the code but not able. The thing is instead of doing substring search, I need to search for exact word.

TwiN commented 2 years ago

Ah you're right, yeah. We'd need to add a new WithExactWord(bool) feature

karthikrajanuber commented 2 years ago

I got this with the below code

    for currentIndex != -1 {
        if currentIndex == 0 {
            searchWord = word + " "
            wordPlacement = 1
        }
        if len([]rune(" "+word))+strings.LastIndex(s, " "+word) == len([]rune(s)) {
            searchWord = " " + word
            wordPlacement = 2
        }
        str := strings.Split(s, " ")
        if len(str) == 1 {
            if foundIndex := strings.Index(s[currentIndex:], strings.TrimSpace(searchWord)); foundIndex != -1 {
                for i := 0; i < len([]rune(strings.TrimSpace(searchWord))); i++ {
                    runeIndex := g.indexToRune(string(censored), currentIndex+foundIndex+i)
                    censored[originalIndexes[runeIndex]] = '*'
                }
                currentIndex += foundIndex + len([]rune(strings.TrimSpace(searchWord)))
            } else {
                break
            }
        }

        if foundIndex := strings.Index(s[currentIndex:], searchWord); foundIndex != -1 {
            if wordPlacement == 0 {
                for i := 0; i < len([]rune(searchWord))-1; i++ {
                    runeIndex := g.indexToRune(string(censored), currentIndex+foundIndex+i)
                    censored[originalIndexes[runeIndex]] = '*'
                }
                currentIndex += foundIndex + len([]rune(searchWord))
            }
            if wordPlacement == 1 {
                for i := 0; i < len([]rune(strings.TrimSpace(searchWord))); i++ {
                    runeIndex := g.indexToRune(string(censored), currentIndex+foundIndex+i)
                    censored[originalIndexes[runeIndex]] = '*'
                }
                currentIndex += foundIndex + len([]rune(strings.TrimSpace(searchWord)))
            }
            if wordPlacement == 2 {
                for i := 1; i < len([]rune(strings.TrimSpace(searchWord)))+1; i++ {
                    runeIndex := g.indexToRune(string(censored), currentIndex+foundIndex+i)
                    censored[originalIndexes[runeIndex]] = '*'
                }
                currentIndex += foundIndex + len([]rune(strings.TrimSpace(searchWord)))
            }
            wordPlacement = 0
        } else {
            break
        }
    }