amitbl / blocktube

YouTube™ content blocker
GNU General Public License v3.0
944 stars 67 forks source link

[Feature request] Block all languages except: #433

Open SnowySchoko opened 3 months ago

SnowySchoko commented 3 months ago

The title says it all. I don't know if I'm the only one, but I keep getting video suggestions in Spanish, Russian, Polish, etc., even though I've never seen one of them and only watch English or German videos. And the way YouTube works lately, if you accidentally click on one of them, it will only suggest videos in that language for weeks. It feels like a minefield, because it gets worse with every one you accidentally click on. :D

GithubAnon0000 commented 3 months ago

You can more or less do that with regex (at least on mobile. The desktop version doesn't work as reported in #397). Here are a few I know (but keep in mind that they might block other languages too, if they happen to use the same character set):

//regex to block arabian characters
/(?=[\u0621-\u064A\u0660-\u0669])/i

//regex to block chinese characters
/(?=[\u4E00-\u9FFF])/i

//regex to match japanese characters (e.g. ウェザーニュース)
/(?=[\u3000-\u303F]|[\u3040-\u309F]|[\u30A0-\u30FF]|[\uFF00-\uFFEF]|[\u4E00-\u9FAF]|[\u2605-\u2606]|\u203B)/i

//regex to match punjabi / panjabi characters, e.g. "ਭਗਤ ਕਬੀਰ ਜੀ ਦੇ ਸਲੋਕ"
/(?=[\u0A00-\u0A7F])/i

//regex to match hindi (devanagari) characters, e.g. "कबीर साहेब"
/(?=[\u0900-\u097F])/i

//regex to match thai characters, e.g. "พายุถล่มหน่วยหลังคาร่วงเจ็บ"
/(?=[\u0E00-\u0E7F])/i

//regex to match tai viet characters, e.g. "ꪃꪄꪅꪆꪇꪈꪉꪊꪋꪌꪍꪎꪏꪐꪑꪒꪓ"
/(?=[\uAA80-\uAADF])/i

//regex to match vietnamese characters, e.g. "Vụ cô gái đâm 3 người thương vong tại đám cưới: Bênh anh họ bị đánh hội đồng", in this example specifically the letters "ụôáđâườươạớêọịđáộồ". Edit: No longer matches ä ö ü Ä Ö Ü ß.
/(?=[\u00C0-\u00C3\u00C5-\u00D5\u00D9-\u00DB\u00DD-\u00DE\u00E0-\u00E3\u00E5-\u00F5\u00F9-\u00FB\u00FD-\u1EF9])/i

//regex to match korean (hangul) characters
/(?=[\uac00-\ud7af]|[\u1100-\u11ff]|[\u3130-\u318f]|[\ua960-\ua97f]|[\ud7b0-\ud7ff])/i

I also don't know what the example strings translate to. I just copied them from some random comments while "searching" for foreign characters / strings.