ZcashFoundation / zebra

Zcash - Financial Privacy in Rust 🦓
https://zfnd.org/zebra/
Apache License 2.0
406 stars 97 forks source link

Investigate why there are more than 2000 network peer addresses #7787

Closed upbqdn closed 10 months ago

upbqdn commented 11 months ago

Important - User Privacy

Do not post node IP addresses on this ticket, because they can be used to locate Zcash users.

Motivation

zebrad on my local machine had recently 2244 "known peers":

image

There can't be that many Zcash nodes. We should confirm if these are peers from other networks, peers on rapidly changing addresses (credit to Teor for pointing this possibility out), or some other sort of peers.

teor2345 commented 11 months ago

This ticket is tricky to analyse because we don't want to post user IP addresses publicly. They can be used to find Zcash users, which is a privacy issue.

Instead, I would suggest looking for any obvious patterns in failed peer IP addresses:

Also, consider logging the VersionMessage when a peer fails.

For example:

It might also help to:

Ultimately I don't think there's much to worry about here, but it is helpful to remove useless addresses where we can.

teor2345 commented 11 months ago

JSON getpeerinfo in peer IP address order:

 zcash-cli -rpcport=28232 getpeerinfo | jq 'map(.addr)|sort'

Count ports:

zcash-cli -rpcport=28232 getpeerinfo | jq 'map(.addr)|sort' | rg ':' | cut -d: -f2 | cut -d '"' -f1 | sort -n | uniq -c
teor2345 commented 11 months ago

Class B IPv4 address blocks:

zcash-cli -rpcport=28232 getpeerinfo | jq 'map(.addr)|sort' | rg ':' | cut -d: -f1 | cut -d '"' -f2 | cut -d. -f1-2 | sort -n | uniq -c

Change cut -d. -f1-2 to 1 for class A or 1-3 for class C.

teor2345 commented 11 months ago

I made the PR today, but I didn't get time for any investigation. @upbqdn let me know what you find out!

upbqdn commented 11 months ago

I looked at the ports and addresses, and I didn't notice anything useful:

teor2345 commented 11 months ago

~ 96 % of all peers use the standard Mainnet port 8233.

...

Here are the counts of peers in the top 5 class A ranges:

  • 247
  • 103
  • 67
  • 64
  • 64

Ah, so it's not just one port or network adding these addresses. Maybe there's nothing we can do about this, but I'll double-check.

upbqdn commented 11 months ago

I did a similar analysis with 2478 peers:

Again, there is nothing interesting besides the fact that the peers seem well-distributed.

upbqdn commented 11 months ago

Actually, I think the class C range with 124 peers is interesting because essentially half the range is used for Zcash-like nodes. There might be even more nodes in this range that my node hasn't discovered yet.

teor2345 commented 10 months ago

Hmm, I wonder what all those peers are. I'll check for private IP addresses, and also see if I can get their VersionMessages to print out in the RPC in the diagnostics PR.

upbqdn commented 10 months ago

I'll check for private IP addresses

I didn't find any.

teor2345 commented 10 months ago

Ok, I'll try to see what those IPs are running

upbqdn commented 10 months ago

~I just pinged all IPv4 addresses, and 1800 of them responded. I used a 1-second timeout.~

Edit: this is likely incorrect.

teor2345 commented 10 months ago

It seems like there are some addresses that have been gossiped for a long time:

  {                              
    "addr": "IPv4:8233",         
    "state": "NeverAttemptedGossiped",                                                                                                                                                                         "remote_last_seen": "DateTime32 { timestamp: 1696455178, calendar: 2023-10-04T21:32:58Z }"                                                                                                               },                                                                                                 

If those addresses are rotating IPs frequently, that could be the cause of the address book entries. Zebra stops gossiping unreachable peers after 3 hours, but maybe zcashd doesn't, or maybe there's enough churn that some always look fresh?

It seems like zcashd's limit might be exactly 1 month, because there aren't any beyond 4 October.

This is probably fine.

teor2345 commented 10 months ago

Most nodes I can see have recent Zebra or zcashd versions. Some might be rotating addresses. (For example, CI nodes.)

teor2345 commented 10 months ago

I'll check again on Monday, when my node has had more time to try all these peers.

upbqdn commented 10 months ago

I wrote a simple script that checks if the ports of IPv4 peers are actually open for inbound TCP connections: https://gist.github.com/upbqdn/b3ffab3dbc99b8cbc98dfcc944d80fef. The results are:

So, only 136 out of 2280 peers listened on their advertised port for inbound TCP connections. The script also generates a list of these peers, and I used it with Zebra instead of the mainnet.peers file. I increased the limit for outbound connections (and tweaked some other constants :D), and Zebra instantly connected to 99 of the 136 peers. The number of outbound connections grew up to 126 over a few hours.

It looks like it's safe to assume that most of the addresses with open ports are legit Zcash nodes, and the rest are addresses of nodes that don't run anymore but are still advertised.

I think we can close this ticket with the conclusion that the thousands of peers are nodes that showed up on the network and disappeared but keep being advertised.

Another fact that supports this conclusion is that 96% of the advertised nodes ran on port 8233, and both IPv6 and IPv4 addresses seem well distributed.

teor2345 commented 10 months ago

I think we can close this ticket with the conclusion that the thousands of peers are nodes that showed up on the network and disappeared but keep being advertised.

I wonder if they are IP addresses from Zebra's CI?

Here is a list of the Google Cloud IP address ranges: https://www.gstatic.com/ipranges/cloud.json

teor2345 commented 10 months ago

Less than 100 peers are potentially in these ranges, out of 2700. So our CI doesn't seem to be a significant contributor.

teor2345 commented 10 months ago

Looking at the types of nodes, there doesn't seem to be anything interesting. Most are up to date, a few are very old.

$ zcash-cli -rpcport=28232 getpeerinfo | jq 'map(.last_version_message)' | rg --fixed-strings -v -e null -e [ -e ] | cut -d "g" -f3- | cut -d '/' -f2 | sort | uniq -c
      1 MagicBean:1.0.11-rc1
      1 MagicBean:2.1.0
      2 MagicBean:2.1.1-8
      1 MagicBean:5.4.2
      3 MagicBean:5.5.0
    111 MagicBean:5.7.0
      4 MagicBean:6.2.0
      1 Zebra:1.0.0-rc.4
      5 Zebra:1.3.0
teor2345 commented 10 months ago

One final investigation: which domains are these IP addresses from?

The first command can take up to half an hour, because it does a reverse DNS lookup for every IP address. If there are multiple DNS entries for an IP address, it will get counted multiple times.

$ zcash-cli -rpcport=28232 getpeerinfo | jq 'map(.addr)' | rg ':' | xargs -n1 dig -x | tee rnds.txt | rg SOA
$ cat rnds.txt | rg SOA | cut -dS -f2 | cut -f2 | sort | uniq -c | sort -n
# deleted any entries below 30 to preserve anonymity
     36 dns-admin.cloudsingularity.net.
     36 support.cloudns.net.
     38 noc.emeraldonion.org.
     59 dns-ops.arin.net.
     76 cloud-dns-hostmaster.google.com.
    128 nstld.iana.org.
    131 domain.in-berlin.de.
    134 tech.ovh.net.
    158 awsdns-hostmaster.amazon.com.
    167 dns.hetzner.com.
    168 postmaster.your-server.de.
    184 dns.ripe.net.
    325 dnsadmin.netcup.net.

These are mostly cloud providers, IP addresses without a registered reverse DNS (iana.org, ripe.net, arin.net), and some privacy networks.

teor2345 commented 10 months ago

It looks like it's safe to assume that most of the addresses with open ports are legit Zcash nodes, and the rest are addresses of nodes that don't run anymore but are still advertised.

I think we can close this ticket with the conclusion that the thousands of peers are nodes that showed up on the network and disappeared but keep being advertised.

Yep, it appears most of these IP addresses are from cloud providers, where users are running short-lived nodes, or nodes with changing external IP addresses.

mpguerra commented 10 months ago

Is there some way we can ensure that zebra only advertises nodes that are still active on the network or is that too much work for not much pay off? Or do we already do this and it's zcashd advertising "old" nodes?

upbqdn commented 10 months ago

Is there some way we can ensure that zebra only advertises nodes that are still active [...] Or do we already do this and it's zcashd advertising "old" nodes?

As Teor wrote:

Zebra stops gossiping unreachable peers after 3 hours, but maybe zcashd doesn't, or maybe there's enough churn that some always look fresh?

It seems like zcashd's limit might be exactly 1 month, because there aren't any beyond 4 October.

I think the current status is fine, though. The only issue I see is that having almost 3000 "known peers" will be confusing for users (as it was for us since we didn't know why).