Closed Firecul closed 2 years ago
@Firecul Is there a mode that looks at comment frequency/intervals? I would assume that someone trying to obfuscate any text (not just phone numbers) by splitting it into separate lines would leave all the comments within a certain timeframe.
Is there a mode that looks at comment frequency/intervals?
No there isn't, that might work but I'm not sure what data is available through the API. It'd prob need more processing than normal though to keep track of comment times when possibly sorting through 10000s of comments
The YouTube API actually does provide the time a comment was posted down to the second, but I'm not sure if it would be worth doing
@ThioJoe @Firecul Yeah, it would require some extra work, but there is a relatively easy solution.
You would have to iterate through the comments and create an inverted index by author (basically, a dictionary where the keys are the name of the author and the values are a list of the indices of the comments made by that author, such as {"author1" : [2, 3], "author2" : [5] }). Then, iterate through the inverted index to find authors who have at least n comments and then calculate the frequency/intervals for those comments only.
The only other solutions that come to mind would be to: (1) sort comments by author and chronologically and then join them into a single string before running it through the filter, or (2) check if a comment is a fraction of a telephone number by using something like is_number_fraction = not any ([char.isalpha() for char in comment])
Adding all those checks would probably increase the processing time and slow down the scanning considerably I think
Yeah, I mean, I don't think (1) and (2) are viable solutions as (1) would take too long and (2) would return False in almost all cases. The reason I propose the solution using the inverted index is that it would detect more spammy behavior than just the example shown here. Obviously, if that overcomplicates things, users can still find this type of spam using regex.
I would just look for a + or two numbers. If true, check to see if the next comment by the user has just numbers. This could help find the spam comment, but have minimal impact on scanning.
@RacerDelux Yes, something like (2) I posted above would be able to look for both those patterns. It would also work if there are spaces, punctuation marks, emojis, etc. between numbers. (Would also detect 2 or more numbers and work even if the numbers are written like ⑧, ⁸, 8, etc.)
@Firecul Are you using regex to search for telephone numbers? I wonder if using a pattern of "+" + country code would work. (Assuming that all telephone numbers contain a country code.)
@Rairye I didn't use regex, I just came across it when finding other spam. If they are smart they will use country codes otherwise it greatly restricts potential victims, they need to make it as simple as possible to fall for.
@Firecul Ah, alright. I just read through the code of that mode and it looks like it uses regex dictionaries. If scammers almost always include the country code, then maybe that pattern could be added in the mode. (Edit: Or manually add country codes as spam words.)
This is basically covered by the detection of spam threads as a whole in 2.15, so gonna close this unless it becomes necessary again
Filter Mode
Auto-Smart Mode
Select the Problem
A type of spammer is not detected at all
(Optional) If 'Other', Enter Very Short Description
Splitting the number across several messages can't be detected currently
Spammer Example / Sample
You can find the comments as replies here https://www.youtube.com/watch?v=RQKQNxLeq1c&lc=UgwwK9EwblTpbPSLrWl4AaABAg
Video / Post Link
https://www.youtube.com/watch?v=RQKQNxLeq1c&lc=UgwwK9EwblTpbPSLrWl4AaABAg
(Optional) Additional Info / Context
No response