Charcoal-SE / SmokeDetector

Headless chatbot that detects spam and posts links to it in chatrooms for quick deletion.
https://metasmoke.erwaysoftware.com
Apache License 2.0
474 stars 182 forks source link

NS watch/blacklist sometimes fails #6822

Open tripleee opened 2 years ago

tripleee commented 2 years ago

What problem has occurred? What issues has it caused?

Domains with a subdomain bypass NS checks (originally, I thought anything with www. before the server name, but it seems to be more complex actually).

Recent example, www.eduauraa.com should trigger watched NS but doesn't. https://metasmoke.erwaysoftware.com/post/352164

What would you like to happen/not happen?

NS watches and blacklists should trigger predictably.

tripleee commented 2 years ago

So far, unable to repro. This is a pattern I have observed multiple times in the past but the basic logic is already working correctly.

tripleee commented 2 years ago

@teward Do you see DNS errors around these posts? Another one just now https://metasmoke.erwaysoftware.com/post/352301

tripleee commented 2 years ago

Another: https://metasmoke.erwaysoftware.com/post/352527

tripleee commented 2 years ago

Another: https://metasmoke.erwaysoftware.com/post/353627

stale[bot] commented 2 years ago

This issue has been closed because it has had no recent activity. If this is still important, please add another comment and find someone with write permissions to reopen the issue. Thank you for your contributions.

tripleee commented 2 years ago

Yet another: https://metasmoke.erwaysoftware.com/post/361856

tripleee commented 2 years ago

Still more: https://m.erwaysoftware.com/posts/uid/stackoverflow/72200261

tripleee commented 2 years ago

Yet still another: https://metasmoke.erwaysoftware.com/post/368235

tripleee commented 2 years ago

Another, I guess? https://metasmoke.erwaysoftware.com/post/368913

tripleee commented 2 years ago

IDNA trouble: https://metasmoke.erwaysoftware.com/post/369464 should have triggered on watched NS mihanwebhost.com

tripleee commented 2 years ago

Yet one more: https://m.erwaysoftware.com/posts/uid/stackoverflow/72486333

tripleee commented 2 years ago

Another: https://metasmoke.erwaysoftware.com/post/372088 (vaguely at the same time as Metasmoke went down briefly, but I don't think it's related to that; should have matched on watched IP, too).

tripleee commented 2 years ago

Yet still another: https://metasmoke.erwaysoftware.com/post/373871

tripleee commented 2 years ago

Also https://metasmoke.erwaysoftware.com/post/380111

tripleee commented 2 years ago

https://m.erwaysoftware.com/posts/uid/stackoverflow/73235810

tripleee commented 2 years ago

https://metasmoke.erwaysoftware.com/post/381157 unrelated reasons?

tripleee commented 2 years ago

https://metasmoke.erwaysoftware.com/post/382495

tripleee commented 2 years ago

Something really weird going on with outlookindia.com, the site www.outlookindia.com has a separate set of NSes but I can't match on that either. https://metasmoke.erwaysoftware.com/post/382637

tripleee commented 2 years ago

Ditto for caramellaapp.com in e.g. https://metasmoke.erwaysoftware.com/post/383062

tripleee commented 2 years ago

https://metasmoke.erwaysoftware.com/post/391301

teward commented 2 years ago

@teward Do you see DNS errors around these posts? Another one just now https://metasmoke.erwaysoftware.com/post/352301

I have never seen DNS errors in the system on this. However, what needs to be known is that to do forced subdomain stuff and picking up proper subdomain detections to base TLD and such is "what is the base tld?" and I mention this because things like .co.uk are actually secondary level domains despite being TLDs.

If you can suggest a proper way to extract the base domain and then do stuff with that for subdomain queries then it's a simple call to the resolver libraries we're using for the base domain. That's not something that I'm going to write though, I don't have the spare cycles for it.

teward commented 2 years ago

https://metasmoke.erwaysoftware.com/post/391301

Are you sure that's an instance? Specified domain's NS records are Cloudflare, are we flagging Cloudflare as suspicious now?

tripleee commented 2 years ago

https://metasmoke.erwaysoftware.com/post/392103

tripleee commented 2 years ago

@teward Cloudflare specifies a particular NS pair for each individual client, the NS watches and blacklists we have in place target a large number of these particular pairs (and in fact the collection of Cloudflare pairs dominate both of these files). This domain has the NS pair chance.ns.cloudflare.com. ullis.ns.cloudflare.comwhich is inwatched_nses.yml` since a while back.

tripleee commented 2 years ago

@teward We already have logic for extracting the base domain, it's a library called tld

tripleee commented 2 years ago

https://metasmoke.erwaysoftware.com/post/392871

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/397210

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/397444

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/398112

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/398239

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/398251

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/399426

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/399641

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/400501

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/376202

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/401016 - weirdly the previous one https://metasmoke.erwaysoftware.com/post/401012 had "potentially bad NS"

tripleee commented 1 year ago

Tangentially, https://metasmoke.erwaysoftware.com/post/402479 should have matched both IP address and name server, but bypassed those checks apparently because of the link obfuscation.

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/402514

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/408176

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/411601 is more straightforward and should be easy to fix.

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/412226

tripleee commented 1 year ago

Weirdly, IP lookup failed on https://metasmoke.erwaysoftware.com/post/412865

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/412922

tripleee commented 1 year ago

https://m.erwaysoftware.com/posts/uid/stackoverflow/75579124

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/416435

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/417301 and https://metasmoke.erwaysoftware.com/post/417302 (same spam reported again; still no NS).

tripleee commented 1 year ago

Tangentially https://metasmoke.erwaysoftware.com/post/418986

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/420062

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/420270

tripleee commented 1 year ago

https://metasmoke.erwaysoftware.com/post/420552