StevenBlack / hosts

🔒 Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
MIT License
26.94k stars 2.23k forks source link

Chinese Information Gathering #493

Closed ScriptTiger closed 6 years ago

ScriptTiger commented 6 years ago

Does anyone know any good sources to get all the Chinese information gathering domains? I have been doing a bit of research on some IoT devices and apparently any Chinese software that slips in talks back to Baidu, Tencent, etc. Effected devices include TVs, tablets, and anything else they can get into your home apparently, so you might want to think twice before thinking you got a deal on some cheap Chinese stuff online. I have also been doing some testing with some rather popular Android apps that are loaded with such Chinese software, playing around with an old junker phone I can afford to have the IMEI of stolen or hardware reset a few times. For anyone that's not familiar, if you are turned off by things like Google Analytics...well...you might just be shocked by some of the things the Chinese are doing in their apps to say the least.

Atavic commented 6 years ago

Maybe is it useful? EasyList Lite China: https://github.com/cjx82630/cjxlist/

ScriptTiger commented 6 years ago

! Title: CJX's Annoyance List ! Supplement for "EasyList China+EasyList" & "EasyPrivacy" ! Removed Annoyances, Self-promotion & Privacy Protection on Chinese Websites ! Last modified: 2018/02/12 14:21 +0800 ! Expires: 4 days (update frequency) ! License: http://creativecommons.org/licenses/by/3.0/ ! Email: cjxlist@gmail.com ! Homepage: http://abpchina.org/forum/forum.php?mod=viewthread&tid=29667

It looks like they have some good data, but the formatting is a bit off for this repo and they have a lot of Chinese characters which might throw off the script like Cyrillic has in the past. I'll submit an issue with them and see if they would be willing to curate their data in hosts format, as well. Steven will have to do his review for it to join this repo, but I think it would at least benefit their followers in any case.

Atavic commented 6 years ago

I stumbled upon some IP lists in the past, maybe related to shadowsocks, XX-net or GoAgent...

Now I found that many use a gwflist2pac that translates a base64 list into an usable PAC File, e.g. this outdated output.

ScriptTiger commented 6 years ago

Those blacklists (as referenced here: https://github.com/gfwlist/gfwlist) are lists of domains that are blacklisted by the Chinese Government. Doing a quick search, you'll notice every domain of Facebook is on it, but no mention of Tencent or QQ or Baidu.com.

That's a good reference though for people interested in what's on the Chinese firewall. If you are interested in that, you might find the Indonesian firewall interesting, as well: https://trustpositif.kominfo.go.id/

Click the "Unduh Database" button to download it. Maybe we could start a completely unrelated repository of collections of national firewalls, I'm sure there could be some applications to it.

Atavic commented 6 years ago

So the PAC File has whitelisted sites and the proxy is used for blacklisting any other IP. Unduh Database is mainly focused on nudes.

ScriptTiger commented 6 years ago

Oh, okay, I get what you are saying, taking the whitelist approach rather than the blacklist approach. Yeah, I am sure it's massively helpful for people who live in mainland China, but outside of mainland China is a whole different context for both whitelists and blacklists. But perhaps comparing the whitelist with a list of domains under Chinese authority might help to make a good blacklist out of all the domains not listed on the whitelist. That's actually a super interesting idea to generate a good Chinese blacklist super quickly, but I don't know how well curated it can be unless we have somebody with experience from China on board for contextual support.

The Indonesian list is a socially maintained list, similar to OpenDNS or PhishTank, where anybody can submit suggestions and it's not very well curated as far as outdated entries. "Unduh" just means "download" in Indonesian.

Atavic commented 6 years ago

The issue definitely needs someone from behind the wall. Also, some related repositories have been deleted in the past.

ScriptTiger commented 6 years ago

Well, it's illegal to circumnavigate restrictions put in place by a federal authority, so such projects that use such lists to do so understandably violate the terms of use put in place by GitHub for legal liabilities, etc. However, I will just state right now for the benefit of anyone reading this for review/audit purposes, I have no intention of circumnavigating Chinese regulations and simply aim to produce a viable blacklist to block unwanted domains originating from China, which we are fully within our rights to do.

pascalwhoop commented 6 years ago

I'd also be curious about this. When one gets a phone from Huawei or Xiaomi, it may not hurt to block all their hosts if all you want is the hardware without the calling home features... Blocking all the rest meanwhile ... sure

anudeepND commented 6 years ago

@pascalwhoop Regarding Xiaomi products, I have already included telemetry domains through many PRs, you check data/StevenBlack/hosts file to see the domains.

Edit: Since I use many Xiaomi products, I have monitored closely to include telemetry and ad domains without disturbing other service. (like security updates for built-in antivirus etc.) I have also examined Vivo and other Chinese brands as well. I will be adding any bad domains as soon as I find them.

pascalwhoop commented 6 years ago

@anudeepND nice. What's your pipeline to find new domains? I am yet to find a good application that let's me block App level based hosts/domains on Android with whitelist support and logging. So whenever my phone connects to some new endpoint, I can check that in the log and decide if I want to allow that connection. of course the browser would be blacklist, as whitelisting the browser would be crazy tedious.

So short question: How do you know if they spin up a new subdomain?

anudeepND commented 6 years ago

@pascalwhoop I use this Android app to monitor specific apps, I also run Pi-Hole on my network so that any queries made by the system can be analyzed. I usually check everything after every system upgrade. So if they spun up a new domain, I can add it to blacklist immediately. Regarding testing, I usually check blacklisted domains for a week before creating a PR to this repo, so there will be no blocking of legitimate domains.