FrostCo / AdvancedProfanityFilter

A browser extension to filter profanity from webpages
GNU General Public License v3.0
164 stars 26 forks source link

Word Whitelist #66

Closed DJKOT2 closed 6 years ago

DJKOT2 commented 6 years ago

Hello. Just installed your extension, Firefox 54 x64, Win 7x64. Trying any options at any words at any language, but nothing. Just plain NOTHING. What can be wrong?

richardfrost commented 6 years ago

For security purposes and practicality - I really only support the latest version of Firefox and Chrome. Is there a reason why you can't update to the latest version? I definitely encourage it, and so does Mozilla (the foundation the produces Firefox). They go so far as to encourage the use of another up-to-date browser instead of using an older version of firefox.

Not saying I am refusing to dig deeper, but if it can be solved by simply updating your browser that is definitely the best way to solve it.

DJKOT2 commented 6 years ago

Hi Richard, I've heard, modern Firefox versions are not supporting Classic Theme Restorer anymore.

http://addons.mozilla.org/ru/firefox/addon/classicthemerestorer/

This plug-in is really important to me. Chrome is too oversimplified for me. I prefer classic interface with lots of buttons and features. I'm not really into that mobile-shmobile stuff and thingies where everything is dumb simple as possbile. Maybe there is an old version somewhere?

richardfrost commented 6 years ago

I just tested it on 54.0.1 and it seems to be working for me, so I'm not sure what is going on with it. Does the options page for the extension work? For example, can you add a new word, export your config, etc.?

DJKOT2 commented 6 years ago

I can record a screencast of how it's working. Options page appears to be working correctly, as far as I understand this.

richardfrost commented 6 years ago

That would be helpful if you could include something! Thanks for letting me know about the options page. Have you modified the options, or are they still the defaults?

DJKOT2 commented 6 years ago

I've tried to add some new words, it seemed to work properly, but still nothing. Even with the old words. No words are censoring, neither old ones, nor new ones.

And I can't remove the words I've added

richardfrost commented 6 years ago

Do any of the words you added have special characters? That is something that I still haven't quite figured out how I want to deal with. If it isn't too convenient, you could try restoring defaults in the "config" tab. If you do have words that use special characters I can probably help you find a way to include them now.

DJKOT2 commented 6 years ago

Special characters like which? Are other languages (non-English) count?

richardfrost commented 6 years ago

Possibly? It depends if they conflict with Regular Expressions. If you want, you could either send a screenshot of your wordlist, or just export your settings and share them here.

DJKOT2 commented 6 years ago

ok, I'll try something tomorrow morning

richardfrost commented 6 years ago

Sounds good, thanks for being willing to provide troubleshooting help!

DJKOT2 commented 6 years ago

For some reason (after I've reset settings to defaults) it worked. But sadly, it's working only with English words.

richardfrost commented 6 years ago

I believe that Javascript's Regular Expression implementation doesn't provide support for unicode characters. There is a library that does though, which I might be able to include here. Since I only speak english, I don't really have a way to actually test it. Do you have an example word that isn't working? It would be nice to add support for other languages if you would be willing to help me on it.

DJKOT2 commented 6 years ago

The language I'm trying to censor is Russian. So there are only cyrillic letters.

We can use any word for example. Such as "Тест" for Test. It's really does not matter, since cyrillica letters aren't working anyways.

Sure, I can help you to improve your extension, but how I might be useful? I don't know how to code )))

richardfrost commented 6 years ago

Thanks for the information! In the next couple days I will try to add support for unicode, which should allow this (and just about every other language) to work. You could help by testing those changes when I'm done with them and let me know if it works for you or not.

DJKOT2 commented 6 years ago

ok. I can do that )))

Thanks.

DJKOT2 commented 6 years ago

Hi, any updates?

richardfrost commented 6 years ago

Sorry, I have been swamped with things to do, but I'm hoping to get to it this weekend. I will let you know how it goes!

On Thu, Mar 8, 2018 at 8:14 AM DJKOT2 notifications@github.com wrote:

Hi, any updates?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/richardfrost/AdvancedProfanityFilter/issues/66#issuecomment-371516999, or mute the thread https://github.com/notifications/unsubscribe-auth/ABb_hQ-HRdymEQDuYHSpfM64AgYgf7_nks5tcUrxgaJpZM4SOBhO .

DJKOT2 commented 6 years ago

ok, thanks

richardfrost commented 6 years ago

Well, I did have some time working on it this last weekend, and I found some interesting things. It looks like in Chrome and modern FF it is working on the censor and substitute methods, but not on the remove. I will try to fix it on remove today. Can you check to see if either of those methods make a difference for you?

Here is a working config (you can export and save your config before importing this one so you can go back):

{
  "censorCharacter": "*",
  "censorFixedLength": 0,
  "defaultSubstitutions": [
    "censored",
    "expletive",
    "filtered"
  ],
  "disabledDomains": [],
  "filterMethod": 1,
  "globalMatchMethod": 1,
  "preserveFirst": false,
  "preserveLast": false,
  "showCounter": true,
  "substitutionMark": true,
  "words": {
    "Тесты": {
      "matchMethod": 0,
      "words": []
    }
  }
}

I was testing here.

DJKOT2 commented 6 years ago

thanks, I'll check it out

DJKOT2 commented 6 years ago

ok, is there an option to import\export to file? where is this config stored?

richardfrost commented 6 years ago

If you open the extension's options page its there in the "Config" tab. There is just a large text box that serves as both a way to export and import config. So, I would recommend clicking on the Export button first, then you can copy your current config from that textbox and save it to a file somewhere. Then you can try copying the above config into that same textbox and click on Import. It will overwrite your current config with what is in the text box.

You should be able to open the extension's settings by either opening the plugins page, or clicking on the icon and selecting "Options".

DJKOT2 commented 6 years ago

ok, one more question, is there a possibilty to create the exclusion words, which should never be censored under any circumstances?

richardfrost commented 6 years ago

There is not currently a way to "whitelist" (never censor) words. It is something I could consider adding, but in most cases I have found its enough to change the word's "Matching Method" from partial to exact. If you have an example I'd be happy to look at it and consider adding a whitelist/exclude list for words though.

DJKOT2 commented 6 years ago

does exact matching including punctuation, case or it's just a word as is separated by two spaces? slavic languages are more complex than european when it comes to forms of words, genitives, grammatical cases, etc. so one curse word will be having about 12 or maybe even 36 (!) variations. not counting typos, misspellings., erratives and what not....

some long words are actually including shorter words which can be interpreted by filter as cursing, but in reality they aren't. sound kinda crazy, isn't it?

I guess you need some examples for this.

richardfrost commented 6 years ago

That does sound pretty crazy! I'll mention it again, but I definitely don't have much experience with this. "Exact matching" ignores case (as all filtering methods do in this extension), but it basically will stop censoring a word that contains the word instead of doing a partial match (part of the word). Exact match should match punctuation as well, but depending on the character that can get messy too. I hope to have that more polished in the next release. If you have an example of some punctuation I can take a look. From what you describe it sounds like its a very complex problem, and dealing with up to 36 variations of a word is crazy!

If you do have an example that would be nice. Do you think a whitelist/exclude list would be helpful in this came? With so many variations that sounds like it would be very difficult to get everything setup.

DJKOT2 commented 6 years ago

whitelist certainly would be helpful in some cases, especially in other languages.

okay, I'll try to think of some examples which are not so obscene..... :)

let me try to explain this with common everyday words. for example we have two words.

ход (pronounces as khod)
and пароход (pronounces as parokhod)

so... the first one, "ход" ,can be translated as a "running" or "movement", right.....

but the second one, "пароход", translates as "steamboat"

these two words sharing the same three letters which are "ход", but their meaning is quite different.

and there are tons of examples in Russian language, which sharing in common some certain sequence of letters but with drastically different meaning.

мышь - камыш (a mouse - a reed)

длинный-подлинный (long - genuine)

шип - шипеть (a spike - to hiss)

and many-many more…. hope this gives a better idea.

richardfrost commented 6 years ago

Sorry it's taken me so long to get back to this, its been a very busy couple weeks. Unfortunately after looking at it a whitelist is not very easy to add, and would require quite a few changes which could decrease performance overall. I will keep thinking through it and might come up with another way to implement it, or I may end up either making a top-level option to enable the whitelist so it doesn't adversely affect those not using it.

I appreciate those examples you gave above, and can see how a whitelist could be useful. I'm curious, do you think the whitelist would grow larger than the normal filter list? Do you think it would be possible to make adjustments to the regular filter list using the provided match methods to accomplish some of this?

Here's a brief overview.

For instance:

If you wanted to match long but not genuine, you could add the word длинный with the matching method of exact.

I'm just wondering from a use case if we did have a working whitelist, do you think it would be very large? Would it be hard to maintain? Like I said, I will keep playing around with it and see if I can find a new way to implement it. Thanks again for all your time in discussing this with me.

DJKOT2 commented 6 years ago

If you wanted to match long but not genuine, you could add the word длинный with the matching method of exact.

true. but I'll have to add all cases and plural forms, which will be длинный, длинного, длинному, длинным, длинном, длинные, длинными.

it's six more words, if I remember my grammar correctly.

richardfrost commented 6 years ago

K, I think I might have a way to get this. Would you like to see the whitelist be per-word, or global (for all words)? Do you have any preference?

DJKOT2 commented 6 years ago

hmm... don't know yet. maybe global is better :)

richardfrost commented 6 years ago

Alright, I have a prototype working that I'd love some feedback on:

https://github.com/richardfrost/AdvancedProfanityFilter/archive/word_whitelist.zip

If you download, extract, and then load a temporary add-on in firefox (more info) you should be able to test it.

I would suggest disabling the other version (the official one) while testing this. You can export your config from the current one from the "Config" tab, and then import it to the temporary one.

To test the whitelist functionality, you can create an entry in your config directly (Probably the one you exported) by clicking export which will populate the window with your current config, make the changes, and then click import. Here is an example to add "assassin", "pass", "passing", and "password" to the whitelist:

...
...
 "substitutionMark": true,
  "whiteList": [
    "assassin",   
    "pass",
    "passing",
    "password"
  ],
  "words": {
...
...

I don't plan on building an interface for it until it is all working. But basically, you are just adding a whiteList key with a list of values in an array to not be filtered. If you have a large list of pre-compiled words let me know and I can quickly convert them into a JSON array for you.

Some of my design choices:

DJKOT2 commented 6 years ago

Hi, thanks, I'll check this out in a few days :)

richardfrost commented 6 years ago

Don't want to rush you, but I'm curious if you have had a chance to take a look at this yet?

DJKOT2 commented 6 years ago

Hi, sorry for taking so long, I'll check this as soon as I have some free time. I've been very busy last month. I had lots of things to do.

richardfrost commented 6 years ago

No worries. I definitely understand being busy. Just whenever you get some time, I'd love to get your feedback on it.

richardfrost commented 6 years ago

I'm going to go ahead and close this now. Its been a long time, but if this is still an issue you can let me know.

DJKOT2 commented 6 years ago

I just haven't time to test it out

richardfrost commented 6 years ago

Yep, life gets busy sometimes. I understand! Good luck, and if you do get time and want to revisit this let me know.