Ultimate-Hosts-Blacklist / whitelist

The whitelist of the Ultimate Hosts Blacklist project, infrastructure and beyond.
MIT License
50 stars 13 forks source link

presumably wrong domains #154

Closed ghost closed 3 years ago

ghost commented 4 years ago

@rusty-snake commented on Nov 14, 2020, 5:40 PM UTC:

ATM there are three domains which did not look right.

This issue was moved by dnmTX from Ultimate-Hosts-Blacklist/Ultimate.Hosts.Blacklist#593.

Somebodyisnobody commented 4 years ago

Haha, got creepy results: Screenshot_2020-11-14 Home The Ultimate Hosts Blacklist

://www.bit.ly/38tqwJT https://github.com/Ultimate-Hosts-Blacklist/xorcan_tuerk_ad_list/blob/1e7a12765973e756a83ded0cb59de714eb11931a/clean.list#L3378

blog.goo.ne.jp/orochima/ and freeiv.pixnet.net/blog are located here: https://github.com/Ultimate-Hosts-Blacklist/Bad_JAV_Sites/blob/9de482fee933a7f70d00b3568005dd0e44a059f9/domains.list#L118 See also:

Somebodyisnobody commented 4 years ago

Seems like we have structural problems with input validation. What's about implementing a regex filter? ^([\w0-9-]+\.)*[\w]{2,10}$ .. 2nd-lvl-domain |.......... tld

Somebodyisnobody commented 4 years ago

@dnmTX it's also an upstream issue (when it's not affected by the caching-bug). Can you tag it?

dnmTX commented 4 years ago

Thanks for the report @rusty-snake. First there is a bug/problem with the filtering/update and even if there are any changes in upstream they will not make it here until it's fixed. Second: blog.goo.ne.jp/orochima/ and freeiv.pixnet.net/blog are present in Bad_JAV_Sites and the origin of that list is clearly unknown(info.json is empty). @funilrys will need more info from you on this one and advise on what's next. Possible removal maybe?

://www.bit.ly/38tqwJT is present in xorcan_tuerk_ad_list and checking upstream it's not as bad but it's clearly needs to be looked at. PING @xorcan for further action/fix in upstream 👍 @xorcan just FYI you CAN'T USE ! for comments in a hosts file,only # is permited. Please fix this as well!

Guys,thanks for the reports but until @funilrys fix the filtering/update issue nothing else can be done at this moment. Be patient, once it's fixed will start digging into all those issues to make everyone's experience better. Thank you!

dnmTX commented 4 years ago

Ok. Considering the fact that the filtering and upstream updating is completely broken at this moment i went ahead and removed all the mentioned domains from their respective lists/repos. It's temporary patch for now,let's hope it will hold. Will keep this issue open for the time being.

rusty-snake commented 4 years ago

I discovered this while developing uhb2dnsmasq, where I use ^[0-9A-Za-z._-]{1,500}$ to ensure that no option can be injected into the dnsmasq configuration (e.g. some/\nlisten-address 192.168.0.23) and print mismatches to see if this regex is to strict.

Seems like we have structural problems with input validation. What's about implementing a regex filter?

Is there any input validation? Or would it be possible to inject something like doubleclick.net\n1.2.3.4 google.com?

dnmTX commented 4 years ago

Seems like we have structural problems with input validation. What's about implementing a regex filter?

@rusty-snake @Somebodyisnobody such a issues should be addressed in dev-center but let's take one problem at a time and put that filtering problem as a high priority due to none of us here had any of their lists updated properly for almost a year. Every other issue could/should wait.

ghost commented 3 years ago

i fixed all wrong things. thank all of you for report me. take a look again please: https://github.com/xorcan/hosts/blob/master/README-EN.md

@dnmTX @Somebodyisnobody

dnmTX commented 3 years ago

@funilrys Bad_JAV_Sites has no upstream link. Are we keeping it and maintain it ourselfs or needs to go?

funilrys commented 3 years ago

@dnmTX let's ping @mitchellkrogza, those without upstream link exist because of him.

dnmTX commented 3 years ago

those without upstream link

How many we talking about here?

funilrys commented 3 years ago

@dnmTX

Will tell you once back at my computer but a simple GitHub search of "raw_link": null or "raw_link": "" in our whole organisation should tell us that...


Sent from my supposedly smart Phone