Open StevenBlack opened 3 hours ago
Hi, and thanks for reaching out! The hosts file hadn't been updated since 3 years ago so we removed it, not realizing there were still active consumers.
The easiest way to integrate with this repo at this time is to parse the config.json
and read the list of hosts from the blacklist
section. However, as you can see this repository is now being synchronized with a separate data source hosted by SEAL-ISAC. If you'd rather establish a programmatic feed (or simply have the bot also submit merge requests to your hosts repo) we can explore that option as well.
Thanks for the reply @samczsun.
I'll pull from blacklist
. Where is that? I must be blind; I can't find blacklist
here.
BTW I think branch master
should be removed from remote.
When you switch from master
to main
that's a bit woke — and I'm ok with that — but leaving master
up on remote kinda shows, nobody really thought about downstream implications very much.
Keeping master
on remote means downstream will just silently keep pulling error-free stale from master
. In this case, two years of your diligent work didn't actually reach all that is downstream from me.
It's just weird to switch to main
but leave master
on remote.
@samczsun I found it, it's in https://raw.githubusercontent.com/MetaMask/eth-phishing-detect/refs/heads/main/src/config.json.
I won't comment on motivations of the rename (partly because that was before I joined as a maintainer, partly because I don't care), but we did keep an action syncing from main
-> master
for a year before we turned that off 2 weeks ago.
The blacklist
I'm referring to is in config.json.
Unfortunately, removing master
is not something that I can do, but I can flag with the repository admins.
Thanks for the clarification @samczsun.
I see that blacklist
is over 200,000 domains, unsorted.
I always wonder how people can actually curate an unsorted 200,000+ item list.
Has the MetaMask blacklist
become an add-only bucket, now? Because that's typically what happens with long, unsorted lists of domains.
We expire domains manually at the moment, most recently 3 weeks ago, with plans to have domains fall out of the list automatically in the future.
It's unsorted because the diff to sort it would be immense and impossible to review and while the bot automates 99% of contributions, people still occasionally open PRs into this repo.
@samczsun here's some info for you.
This is output from a little utility I'm developing.
The very last line, Intersection: 386 domains
means of your 204,000 domains, the intersection with my amalgamated list of 114,700 domains is just 386 domains. Which feels extremely fishy.
The long lists here are the top 100 TLD and the top 100 root domains in the MetaMask blacklist
. The blacklist
presently holds 12,911 gitbook.io
subdomains, which seems wildly improbable to me. Maybe all this is helpful to your list maintainers?
Name:
Location: text input
Domains: 204,101
Duplicate domains: 0
Invalid domains: 4
TLD:
com: 56,303
io: 19,067
dev: 15,724
xyz: 14,490
app: 13,815
net: 12,381
org: 8,229
top: 4,858
pro: 3,786
network: 3,643
online: 3,457
site: 3,160
cc: 2,911
info: 2,747
live: 2,038
co: 1,910
finance: 1,791
tech: 1,436
events: 1,344
claims: 1,135
trade: 1,037
space: 1,002
vip: 929
icu: 820
trading: 810
cloud: 795
fun: 761
shop: 742
click: 717
me: 664
website: 655
club: 652
fi: 650
lol: 650
store: 636
support: 620
world: 620
one: 593
link: 546
in: 508
us: 477
life: 411
cfd: 403
biz: 394
digital: 370
foundation: 369
pw: 368
eu: 340
gift: 337
buzz: 317
exchange: 314
sbs: 304
art: 298
ru: 254
homes: 253
pics: 242
land: 220
ink: 217
cash: 216
br: 194
ltd: 189
wtf: 171
su: 156
quest: 151
run: 146
cyou: 143
gifts: 139
mom: 139
uk: 137
games: 124
blog: 119
de: 111
lat: 109
build: 108
zone: 106
codes: 100
work: 100
win: 98
cn: 97
id: 96
news: 96
community: 95
to: 95
today: 93
financial: 92
pm: 90
capital: 89
mx: 88
fund: 85
global: 82
money: 78
bond: 75
bio: 71
ws: 71
guru: 70
re: 70
cx: 69
cl: 66
it: 66
pl: 66
Root domains:
pages.dev: 15,247
gitbook.io: 12,911
vercel.app: 3,935
web.app: 2,115
webflow.io: 1,474
netlify.app: 652
azurewebsites.net: 590
github.io: 228
com.br: 174
drop-premint.com: 165
glitch.me: 159
nft-premints.xyz: 151
drop-premint.xyz: 146
dweb.link: 145
on-fleek.app: 134
whitelist-web3.com: 134
free-limited.com: 126
onrender.com: 116
nft-whitelist.com: 103
cf-ipfs.com: 100
42web.io: 98
mypinata.cloud: 92
airdrop-whitelist.com: 88
r2.dev: 75
zeeve.online: 73
firebaseapp.com: 67
co.uk: 66
blogspot.com: 59
limited-drops.com: 59
cprapid.com: 54
fleek.co: 54
com.co: 45
pantheonsite.io: 45
zeeve.net: 44
co.za: 40
workers.dev: 37
co.in: 36
us.com: 35
weebly.com: 33
web3-whitelist.com: 32
b12sites.com: 31
surge.sh: 31
us.to: 30
wordpress.com: 30
duia.us: 29
com.ng: 28
typeform.com: 27
4everland.app: 26
co.ke: 26
com.au: 26
mooo.com: 26
com.tr: 25
netlify.com: 25
bitballoon.com: 24
com.mx: 24
csb.app: 24
launchpadex.com: 24
metamask.cafe: 24
zendesk.com: 24
com.ar: 21
godaddysites.com: 21
line.pm: 21
amazonaws.com: 20
bsquarefli.co: 20
com.de: 20
hyperlockflnance.com: 20
plesk.page: 20
000webhostapp.com: 19
bsquaredfii.net: 19
fanasytops.net: 19
fanlasytop.net: 19
my.id: 19
talkonet.com: 19
astarnetworks.co: 18
fantasytops.org: 18
hyperiockfinance.com: 18
pryzn.net: 18
work.gd: 18
bsquarenetwork.org: 17
com.se: 17
fanasytop.com: 17
web3-l2.cfd: 17
duckdns.org: 16
pacmoonfl.net: 16
artblucks.io: 15
bsquaredfii.co: 15
bsquaredfii.com: 15
co.il: 15
com.pl: 15
crypto-list.info: 15
in.net: 15
pdma.live: 15
registers-welikethefox.com: 15
aerodromefinance.events: 14
airdrop-tokens.com: 14
bg-parite-received.fun: 14
canva.site: 14
com.pk: 14
us.org: 14
wuaze.com: 14
Intersection: 386 domains
Yes, this is a list of domains intended to be consumed by applications targeting cryptocurrency users. The Gitbook entries come from a brand protection partner and are a little different from the typical entries you might find for drainers and other outright scams, but still fall under the remit of this repository. Ideally in the future we would better label the type of scam that the domain represents, but at the moment no one has time to implement such a breaking change.
Thanks for everything @samczsun. I'm going to drop MetaMask/eth-phishing-detect
from distribution principally because, it's much too large now. Additionally, bot-propelled add-only buckets isn't what we do, as a matter of principle.
But please, ping me when the MetaMask/eth-phishing-detect
blacklist
comes under active management in the future. Good?
Sure. You'll be pleased to know that it is currently under active management, but it will likely remain add-only with manual expiry for the medium term. Unfortunately, given the volume of scams targeting cryptocurrency users, it's possible that the list will continue to remain too large for your use case, even after we implement automated expiration. For example, pruning anything older than a year leaves us with 175k entries, and we are likely unwilling to go any lower than that to start due to the possibility of a threat actor re-activating the domain once it's removed from the list.
Hello! 👋🏻
I'm confused about something.
My hosts repo uses the
MetaMask/eth-phishing-detect
list as a source. Theupdate.json
file presently looks like this:Note that https://raw.githubusercontent.com/MetaMask/eth-phishing-detect/master/src/hosts.txt returns a hosts file. But this repo presently doesn't contain a hosts file.
I also notice that, at some point, a branch name change from
master
tomain
happened which, of course, silently breaks absolutely everything downstream.I understand that my
hosts
repo is a derivative work, but downstream from me is gigantic, maybe gargantuan, even.I'd just like some clarification about where the
hosts.txt
file has gone, and whetherMetaMask/eth-phishing-detect
should still be distributed to the world downstream via myhosts
project.