ThioJoe / YT-Spammer-Purge

Allows you easily scan for and delete scam comments using several methods.
GNU General Public License v3.0
4.57k stars 389 forks source link

Filtering: Enhancement: Ensure comment language matches channel language, or certain specified languages. #887

Closed rcmaehl closed 1 year ago

rcmaehl commented 2 years ago

Filter Mode

Regex Search

Select the Problem

Other (add details below)

(Optional) If 'Other', Enter Very Short Description

Ensure comment language matches the channel language

Spammer Example / Sample

Mishan ๐Ÿ…ฅ 7 days ago Don't translate๐Ÿ˜ก เป€เบˆเบปเป‰เบฒเบ–เบทเบเบชเบฒเบšเปเบŠเปˆเบ‡เป€เบžเบฒเบฐเบงเปˆเบฒเบกเบฑเบ™เบ–เบทเบเปเบ›เบ–เป‰เบฒเป€เบˆเบปเป‰เบฒเบšเปเปˆเบ—เปเบฒเบฅเบฒเบเบ„เปเบฒเบชเบฒเบšเปเบŠเปˆเบ‡, เป€เบˆเบปเป‰เบฒเบˆเบฐเบ•เบฒเบเบงเบดเบ—เบตเบ”เบฝเบงเบ—เบตเปˆเบˆเบฐเบ—เปเบฒเบฅเบฒเบเบ„เปเบฒเบชเบฒเบšเปเบŠเปˆเบ‡เปเบกเปˆเบ™เป€เบžเบทเปˆเบญเบˆเบญเบ‡เบŠเปˆเบญเบ‡เบ—เบฒเบ‡เบ‚เบญเบ‡เบ‚เป‰เบญเบเบฅเบปเบ‡โ€‹เบ—เบฐโ€‹เบšเบฝเบ™โ€‹เบ”เบฝเบงโ€‹เบ™เบตเป‰

Direct Link

Video / Post Link

https://www.youtube.com/watch?v=cmCZ27U_-wg

(Optional) Additional Info / Context

I know the discussion of translate each comment to make sure it's not spam has been brought up several times and really isn't feasible for a VAST number of reasons.

However, it would be feasible to check if a comment is, or is one of, English, Spanish, <insert language(s) of choice>

This will be a less strict, but more advanced, filter than ASCII-Only. This would allow channel creators to ensure the comments match the languages they want for the channel. The most appropriate language for the channel or video can be pulled from the youtube data API under <object>.defaultLanguage, <object>.defaultAudioLanguage (videos only), and <object>.Localizations.

Additionally, half of our work is done for us thanks to Mimino666/langdetect

Firecul commented 2 years ago

However, it would be feasible to check if a comment is, or is one of, English, Spanish, <insert language(s) of choice>

How would you be able to tell what the language is without sending it for translation?

rcmaehl commented 2 years ago

How would you be able to tell what the language is without sending it for translation?

Code wise: https://github.com/Mimino666/langdetect

Specifics: https://en.wikipedia.org/wiki/Wikipedia:Language_recognition_chart

Firecul commented 2 years ago

How would you be able to tell what the language is without sending it for translation?

Code wise: https://github.com/Mimino666/langdetect

Specifics: https://en.wikipedia.org/wiki/Wikipedia:Language_recognition_chart

Hmm... interesting

ThioJoe commented 1 year ago

Probably not going to implement this, especially with YouTube pushing for multi-language audio for videos, it is possible there will be more international comments