WEEE-Open / skeeelled

An e-learning platform for the modern age
https://weee-open.github.io/skeeelled
5 stars 7 forks source link

Add moderation step for images #103

Open e-caste opened 2 years ago

e-caste commented 2 years ago

The current pipeline is purely focused on language-based moderation, meaning that it won't care about any image (which we will filter out based on our custom enhanced MarkDown format, ref #97).

The base64-encoded images provided by the users should also be checked for NSFW content. Find a pre-trained model for this task, we don't need to build a dataset of our own for this objective.

Also to note is that users could upload large/very large images, so the model (likely CNN-based) should automatically rescale them to a resolution which allows for a reasonable inference time.

This could be a starting point: https://github.com/SashiDo/content-moderation-image-api


e-caste commented 2 years ago

About point 1:
the nsfw-detector library (Tensorflow-based) seems to work quite well with random images found online (TODO: test it against this dataset (just run the Docker commands docker build . -t docker_nsfw_data_scraper and docker run -v $(pwd):/root/nsfw_data_scraper docker_nsfw_data_scraper scripts/runall.sh): https://github.com/alex000kim/nsfw_data_scraper)

See the test notebook here: https://colab.research.google.com/drive/1y_w0t1ncJwN3vt4xNt0s_EIK_EHggO43?usp=sharing And the test data on Google Drive here: https://drive.google.com/drive/folders/1r_FKqwFpVnr29CtYYJAYTo0txAnjFNVy?usp=sharing

FereshtehFeiz commented 1 year ago

working on the point 4 for extracting the text on the images and I found the below python script with OCR that detects the text in save it in txt file. https://www.geeksforgeeks.org/text-detection-and-extraction-using-opencv-and-ocr/

e-caste commented 1 year ago

Hugging Face just released huggingface.js, so this step could be delegated to their APIs entirely directly from the browser (must clarify API usage cost). https://github.com/huggingface/huggingface.js/blob/main/packages/inference/README.md