FrostCo / AdvancedProfanityFilter

A browser extension to filter profanity from webpages
GNU General Public License v3.0
159 stars 26 forks source link

Filtering images. Something needs too be done. #532

Open tonyram57 opened 6 months ago

tonyram57 commented 6 months ago

Really your extention is working great and I can read articles and forums great without profanity but I know we talked about this before but the images are really annoying me. I been encountered much profanity in images basically the f word everytime the last four days so I am just saying instead of trying to focus on trying to using the mute audio feature that never once ever worked you really need to staer to try to get it work on images even if it's a beta feature that needs work. Thanks and happy new year.

tonyram57 commented 5 months ago

The f word in images is really ticking me off and I am encountering nearly everyday 90% of the time in images.

richardfrost commented 5 months ago

Sorry @tonyram57, unfortunately I haven't had much time to look into this yet. Its on my list to get to it, but I can't promise when that will be. If you could share a couple of these images with me it would help me see how well it might work.

Using tesseract you can see what the results might be using their library, which is probably the best one freely available that could be included in the extension. Here are some examples:

Best (still not perfect)

Good - Screenshot 2024-01-04 105327 good - Screenshot 2024-01-04 105145 gg

Partially worked

Ok - Screenshot 2024-01-04 105723 ok - Screenshot 2024-01-04 105558 Ok - Screenshot 2024-01-04 105429 ok - Screenshot 2024-01-04 104936

Total failure

bad - Screenshot 2024-01-04 104309 Screenshot 2024-01-04 105526 Screenshot 2024-01-04 105658 Screenshot 2024-01-04 110152 Screenshot 2024-01-04 110217

I'm curious to know how some of your examples fare, but based on these results I'm not holding my breath it will be super reliable. You can either test them yourself on that website, or share the images with me and I can give it a try.

tonyram57 commented 5 months ago

Thanks for the examples but those images almost always never contain profanity. You have not been on social media which is ironic because I recently meet some really good people who got me through stuff. Anyway, the images three all have a similar structure that looks easier to detect. I will send some clean examples later.

tonyram57 commented 5 months ago

Here is a type of image which is used often. example

tonyram57 commented 5 months ago

example2

tonyram57 commented 5 months ago

example3

I will send more as I find them.

tonyram57 commented 5 months ago

Screenshot_2024-01-27-19-44-44-88_50ef9f5a0f3fc24b6f0ffc8843167fe4

richardfrost commented 5 months ago

Thanks for sending these examples over @tonyram57. Here are the results running through the OCR process:

Good

Screenshot 2024-01-27 200324

Pretty good

Screenshot 2024-01-27 200400

Okay

Screenshot 2024-01-27 200102

Failed

Screenshot 2024-01-27 195938

As you can see, it is detecting quite a few wrong characters/words, which likely wouldn't cause too big of an issue for false positives when filtering because they usually aren't whole words.

The other part of this issue is that so far all this is focused on recognizing text. Once recognized, there won't really be a lot of options on what to do. It will not be possible to alter/filter the image. Options I can think of so far:

  1. Block/hide the image
  2. Blur the image and allow it to be shown by hovering/clicking/tapping on it
  3. Replace the image with the filtered OCR text (May not be super helpful/useful based on the OCR results of these images having lots of extra garbage characters)

Would you still think it is worth it with one of those 3 options?

Another potential issue is speed/performance. These sample images have been taking between 0.5 - 3 seconds to process. This could really slow down performance on pages with heavy image use. I won't know how bad it is until I can play around with it in the actual extension.

tonyram57 commented 5 months ago

Thanks for all your hard work. I did not realize how difficult filtering images would be. I thought it would be as simple as just detecting the text and then blocking the image. I forgot that the actual background image can trigger text which can make false positives.

It's up to you. I say you can go for it if you have the time but make it a beta feature you must turn on by default that will impact performance and may not work right. I am still going to visit social media where I encounter it the most. I made some real person friends there I connect too that hit me through hard times. I am not going to let the bad apples spoiled it for me. But anyway thanks for all your work.

richardfrost commented 5 months ago

@tonyram57 Yeah, it is a hard problem to solve. It reminds me of this funny comic: https://xkcd.com/1425/, which illustrates the issue pretty well. There are some things that seem hard that turn out to be very easy, and others that seem simple (like this one), but turn out to be almost impossible.

I'll keep it on the list to at least try with an integration in the code and see how it goes when I have some time.

I'm glad you found a good group of people though, even if you have to put up with seeing some things you don't want to see.

tonyram57 commented 5 months ago

Thanks. The worst part about it is that the most profanity about 90% I see in images is f off and f you. It seems that's very popular in images on social media which is annoying.

richardfrost commented 5 months ago

Yeah, that is annoying. Well, sorry I can't do anything with the filter yet. Maybe one day we'll be able to help with that issue.