StractOrg / stract

web search done right
https://stract.com
GNU Affero General Public License v3.0
2.14k stars 49 forks source link

Add Safe search #74

Closed psm-2 closed 1 year ago

mikkeldenker commented 1 year ago

Good point! I don't have time to implement it today or tomorrow, but I will put it on my todo list for next week. I'll just put some notes here for myself.

Create a dataset by searching for NSFW and SFW content and store the clean body of the responses. 1000 samples in each class is probably a good start.

Train a bernoulli naive bayes classifier on the text with tf-idf vectors as features. Use rust-phf to create a perfect hashmap from term => termid at compile.

Filtering logic