Closed ashvinnihalani closed 2 years ago
I'm not sure support for foreign languages is implemented, though I'd think the foreign words could just be added to the spam detection word list 🤔 +1 on adding support for foreign languages if not yet added
I feel like making a folder with filter lists and then checking the spam words through there would be way better than translating messages as sometimes google translate can change the original message's meaning
@EthanHindmarsh Created a PR with foreign language support. Review/testing appreciated especially because I will need to spin up a Windows VM to test properly
@UnknownCrafts How many filter lists can you realistically keep? It is my understanding that bots cycle through thousands of replies combinations, find one that works, and then propagate that one. We can't keep a dictionary of every possible language combo. Are you worried about false positives spiking?
Also, side note: If the bots are truly automated then they would be using Google Translate, to begin with, right? Because they are trying to drive traffic to their site and Youtube's translate feature probably used Google Translate. That way when English speakers, the majority of Youtubes audience, click the translate button it gives the best English translation
@UnknownCrafts How many filter lists can you realistically keep? It is my understanding that bots cycle through thousands of replies combinations, find one that works, and then propagate that one. We can't keep a dictionary of every possible language combo. Are you worried about false positives spiking?
Also, side note: If the bots are truly automated then they would be using Google Translate, to begin with, right? Because they are trying to drive traffic to their site and Youtube's translate feature probably used Google Translate. That way when English speakers, the majority of Youtubes audience, click the translate button it gives the best English translation
my only worry was false positives rising but I understand that we can't just keep on adding filter lists. I guess google translate is a good option but again I worry that false positives might rise because of it.
@UnknownCrafts How many filter lists can you realistically keep? It is my understanding that bots cycle through thousands of replies combinations, find one that works, and then propagate that one. We can't keep a dictionary of every possible language combo. Are you worried about false positives spiking?
Also, side note: If the bots are truly automated then they would be using Google Translate, to begin with, right? Because they are trying to drive traffic to their site and Youtube's translate feature probably used Google Translate. That way when English speakers, the majority of Youtubes audience, click the translate button it gives the best English translation
A decent idea would be to use google translate to detect the language and compare against the spam list for that language
Sending a bunch of requests to the Google Translate API for every message would not be a great solution though 🤔 Would definitely slow down the process greatly
Sending a bunch of requests to the Google Translate API for every message would not be a great solution though 🤔 Would definitely slow down the process greatly
Google translate has quotas as well. We shouldn't be forcing users to balance so many quotas.
@ashvinnihalani Are there timestamps? Also, are there cases of specific foreign languages getting through or cases of non-ASCII text getting through?
I don't think translation is a good idea because (in addition to it being resource-intensive) the methods that spammers might use to evade spam filters in English are not necessarily the same in other languages.
As stated by @ThioJoe in #477 , this is not possible.
Yea it's not really feasible for me to create filters for every single language. You'd be better off using entering your own search terms using one of the other filtering modes.
@ashvinnihalani Also, this is more of a discussion then a issue. @ThioJoe Please move this to the discussions page with the ideas tag, thanks.
I'm not sure support for foreign languages is implemented, though I'd think the foreign words could just be added to the spam detection word list 🤔 +1 on adding support for foreign languages if not yet added
Yeah, ThioJoe can add a couple of scam words either in his spam-lists repo or even directory into the python script,
words like: robux vbucks
in other languages.
Sending a bunch of requests to the Google Translate API for every message would not be a great solution though 🤔 Would definitely slow down the process greatly
Google translate has quotas as well. We shouldn't be forcing users to balance so many quotas.
Yes and Google Translate is also not the best for transalating stuff for certain languages, if you get what I mean.
I'm not sure support for foreign languages is implemented, though I'd think the foreign words could just be added to the spam detection word list 🤔 +1 on adding support for foreign languages if not yet added
Yeah, ThioJoe can add a couple of scam words either in his spam-lists repo or even directory into the python script,
words like: robux vbucks
in other languages.
@ThioJoe I can do this in your YT-Spam-Lists repo, if required.
So a couple of follow up comments: 1) If you add specific foreign words what's to stop the spammers from learning to avoid those specifically. YouTube has a built in feature to block words a scammers get around it by either putting random accents on letters or something similar 2) The rational behind using Google Translate is not to translate foreign comments on non English channels but rather target people using foreign languages to evade spam filters on English channels. These people are probably using Google Translate to translate their spam comments to begin with. Like people have mentioned the spam requirements for people speaking natively in other languages may be different so while this may be helpful in that scenario its not the primary goal 3) In order to mitigate the slow down while waiting for API requests we can implement the following
With all of these improvements I think that the slow down will be negligible. Thoughts @KendallDoesCoding @ThioJoe @UnknownCrafts.
It really wouldn't make a difference because very little of the filter even looks for whole words. I'm just going to close this because tbh I don't really intend to implement any kind of translation functionality.
If there is a pattern of a certain type of spammer in another language I'll take a look and see what I can do. But I'd need actual specific examples
It really wouldn't make a difference because very little of the filter even looks for whole words. I'm just going to close this because tbh I don't really intend to implement any kind of translation functionality.
If there is a pattern of a certain type of spammer in another language I'll take a look and see what I can do. But I'd need actual specific examples
fair, but ig there should be a comment somewhere in README saying this only works for english spam comments
I don't believe that the application handles Foreign Languages very well
If you look at the recent Linus Tech Tips Video. There are several instances of foreign language spam comments getting through.
A simple solution would be to use the googletrans python package to translate the comment text before running the filter.