ThioJoe / YT-Spammer-Purge

Allows you easily scan for and delete scam comments using several methods.
GNU General Public License v3.0
4.57k stars 389 forks source link

Filtering: (Long comments should be excluded in auto-smart mode) #823

Open gcol33 opened 2 years ago

gcol33 commented 2 years ago

Filter Mode

Auto-Smart Mode

Select the Problem

False Positive

(Optional) If 'Other', Enter Very Short Description

No response

Spammer Example / Sample

https://www.youtube.com/watch?v=Abprpm7Q3M0&lc=Ugy42vFBYF3bhgn_NZF4AaABAg

Video / Post Link

https://www.youtube.com/watch?v=Abprpm7Q3M0&lc=Ugy42vFBYF3bhgn

(Optional) Additional Info / Context

First time I tried this tools and I love it. I was just noticing that longer comments (like this one) sometimes get flagged as flase positive.

If I had to guess it could also be because of the first line: "PLAYER HOUSING CAN TRANSFORM WOW"

but then it would be good to kind of consider comment length and weight it in. Spammers often write short messages.

P.s. Your video about installing the .json file is not up to date anymore, google changed their layout quite a bit and I had to spend some time to figure out how to create it.

Firecul commented 2 years ago

I mean long =/= not spam. This looks like a false positive yes but there are still long comments that are spam that do get caught regularly.

gcol33 commented 2 years ago

I mean long =/= not spam. This looks like a false positive yes but there are still long comments that are spam that do get caught regularly.

The comment has over 9000 characters. I have never seen a spam comment that has even half the characters, so I don't think you are correct but if you can show me a single example of such a message I'd gladly be convinced otherwise.

Since this is a filter-based approach, the length of the message is positively correlated with the probability of detection. I feel like this could be considered. I see a few options:

1) A cut-off length after which the message is not considered spam

2) A weighted approach where the message is considered not spam if only a small part of the message is suspect

This can be further refined, for example the message was already listed as unsure, maybe only use option 1-2 if the message is flagged as unsure.