Grinnode-live / 2020-grin-bug-bash-challenge

Finding bugs in Grin-Wallet & Grin-nodes for a bounty prior to Grin fork v5.
3 stars 1 forks source link

[GRIN-NODE] Make sure v5.x nodes can still communicate with v4.x nodes #22

Closed DavidBurkett closed 3 years ago

DavidBurkett commented 3 years ago

Description: The v5.x release switched to protocol version 1000, which made some slight changes to the p2p message formats. When connecting to v4.x nodes (protocol version 2 or 3), the v5.x node should use the old protocol version when communicating with the v4.x peer.

Prerequisites: Start a v5.x node and watch the peers list for nodes with protocol version 2 and protocol version 3. Use the get peers api request to the node to export list of peers.

Expected result: Peers with protocol version 2 & peers with version 3 should occasionally connect and stay connected for a while. If no version 2 or version 3 peers connect, or if they're dropped right away, that would indicate a problem. Provide count of peers over time (from Node API outputs) to show how number of peers evolves.

marekyggdrasil commented 3 years ago

@DavidBurkett thanks for defining this test case, it looks good but I am concerned about one thing - we should make it a bit more formal. In the current form it looks difficult to document the findings and it might be hard for us to review the test results. Perhaps there is a better way of documenting connected peers so that they are being logged? Maybe using API call?

goyle commented 3 years ago

Description

The Grin node v4.1.0 release switched to protocol version 1000, which made some slight changes to the p2p message formats. When connecting to v4.0.x nodes (protocol version 2 or 3), the v5.0.x node should use the old protocol version when communicating with the v4.0.x peer. To prevent splitting the network during the hard fork, the v5.x nodes will not switch to version 2000 until the v5.1.0 release.

See here for more info on the protocol phasing process: https://github.com/mimblewimble/docs/wiki/P2P-Protocol#phasing-out-old-peers

Prerequisites

Start a v5.x node and watch the peers list for nodes with protocol version 2 and protocol version 3. Use the get_peers api request to the node to export list of peers.

Expected Result

Peers with protocol version 2 or version 3 should occasionally connect and stay connected for a while. If no version 2 or version 3 peers connect, or if they're dropped right away, that would indicate a problem. Provide count of peers over time from Node API outputs to show how number of peers evolves.

Environment

OS: Debian 10\ Grin Node: v5.0.0-rc.1\ System Info: Linux debian2 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 GNU/Linux

Steps

1: Building the Node

See here for the full steps for building GRIN-Node v5.0.0-rc.1. 1. Download GRIN-Node v5.0.0-rc.1. ```shell $ wget https://github.com/mimblewimble/grin/archive/v5.0.0-rc.1.tar.gz ``` 1. Extract `v5.0.0-rc.1.tar.gz`. ```shell $ tar -xvf v5.0.0-rc.1.tar.gz ``` * Output should be as follows. ``` grin-5.0.0-rc.1/ grin-5.0.0-rc.1/.cargo/ grin-5.0.0-rc.1/.cargo/config grin-5.0.0-rc.1/.ci/ grin-5.0.0-rc.1/.ci/general-jobs grin-5.0.0-rc.1/.ci/release.yml grin-5.0.0-rc.1/.ci/test.yml grin-5.0.0-rc.1/.ci/windows-release.yml grin-5.0.0-rc.1/.editorconfig grin-5.0.0-rc.1/.github/ ... ``` 1. Install Rust. ```shell $ curl https://sh.rustup.rs -sSf | sh; source $HOME/.cargo/env ``` * Proceed with installation with default profile. ``` default host triple: x86_64-unknown-linux-gnu default toolchain: stable (default) profile: default modify PATH variable: yes ``` * Output should be as follows. ``` stable-x86_64-unknown-linux-gnu installed - rustc 1.48.0 (7eac88abb 2020-11-16) ``` 1. Download dependencies, including `libcursesw5`. ```shell # apt install build-essential git tor cmake git libgit2-dev clang libncursesw5 libncurses5-dev libncursesw5-dev zlib1g-dev pkg-config libssl-dev llvm ``` 1. Build GRIN-Node v5.0.0-rc.1. ```shell $ cd grin-5.0.0-rc.1/ $ cargo build --release ``` 1. Configure the node to save its logs and chain data in the current directory. This is optional and I did this for convenience and testing purposes. ``` $ cd target/release/ $ ./grin server config ``` * The output will be as follows. ``` grin-server.toml file configured and created in current directory ``` 1. If the previous step is done, enable DEBUG mode in `grin-server.toml`. ``` #log level for file: Error, Warning, Info, Debug, Trace file_log_level = "Debug" ``` 1. Start node. ``` $ ./grin ``` 1. Wait until Grin has fully synced. 1. Success!

2: Logging Peer Data

*Open to improvements.

Create a folder called peer_data and create a bash script inside called log_peers.sh. This bash script will save peer data into a new timestamped file every minute until you cancel the script.

log_peers.sh

#!/bin/bash

# Get the grin node foreign api password
secret=$(cat ~/.grin/main/.api_secret)

# Create a directory for peer data logs
mkdir -p logs

num=0
while [ true ]; do
    num=$(($num+1))
    timestamp=$(date "+%Y.%m.%d-%H.%M.%S")
    echo "Logging peer data $num: $timestamp"

    # Call the node's foreign api "get_peers" method
    curl -ugrin:$secret localhost:3413/v2/owner -d '{"jsonrpc": "2.0", "method": "get_connected_peers", "params": [], "id": 1}' >> logs/$timestamp.txt

    sleep 60
done

Run the bash script in the peer_data directory while the GRIN-Node is running. The test will be done using a fully synced node with default grin-server.toml settings (except for file_log_level which is set to "Debug").

$ bash log_peers.sh

It should now be logging data from peer nodes and saving them in a new logs directory.

See here for an example output file. `2020.12.23-09.16.00.txt` ```json { "id": 1, "jsonrpc": "2.0", "result": { "Ok": [ { "addr": "85.10.201.143:3414", "capabilities": { "bits": 31 }, "direction": "Outbound", "height": 1014580, "total_difficulty": 1735968047666383, "user_agent": "MW/Grin 5.0.0-rc.1", "version": 1000 }, { "addr": "45.66.11.31:3414", "capabilities": { "bits": 15 }, "direction": "Outbound", "height": 1014580, "total_difficulty": 1735968047666383, "user_agent": "MW/Grin 4.0.0-beta.1", "version": 2 }, { "addr": "62.171.155.14:3414", "capabilities": { "bits": 15 }, "direction": "Outbound", "height": 1014580, "total_difficulty": 1735968047666383, "user_agent": "MW/Grin 4.1.1", "version": 1000 }, { "addr": "176.9.86.219:3414", "capabilities": { "bits": 31 }, "direction": "Outbound", "height": 1014580, "total_difficulty": 1735968047666383, "user_agent": "MW/Grin 5.0.0-rc.1", "version": 1000 }, { "addr": "213.239.217.14:3414", "capabilities": { "bits": 31 }, "direction": "Outbound", "height": 1014580, "total_difficulty": 1735968047666383, "user_agent": "MW/Grin 5.0.0-rc.1", "version": 1000 }, { "addr": "35.181.69.6:3414", "capabilities": { "bits": 15 }, "direction": "Outbound", "height": 1014580, "total_difficulty": 1735968047666383, "user_agent": "MW/Grin 4.0.0", "version": 2 }, { "addr": "134.209.15.186:3414", "capabilities": { "bits": 15 }, "direction": "Outbound", "height": 1014580, "total_difficulty": 1735968047666383, "user_agent": "MW/Grin 4.0.1", "version": 2 }, { "addr": "129.226.51.103:3414", "capabilities": { "bits": 15 }, "direction": "Outbound", "height": 1014580, "total_difficulty": 1735968047666383, "user_agent": "MW/Grin 4.1.1", "version": 1000 } ] } } ```

For this test, the logging period was 24 hours, which should generate at least 1440 log files.

3: Plotting Number of Peers

After the logging period is over, we can start visualizing our data. We will need to provide a count of peers over time using the GRIN-Node API outputs.

Create a python file called plot_peers.py in the peer_data directory.

plot_peers.py

from pathlib import Path
import matplotlib.pyplot as plt

peer_list = []

paths = sorted(Path(__file__).parent.glob('logs/*.txt'))
for path in paths:
    with open(path, 'r') as f:
        peer_list.append(f.read().count('addr'))

x = [i for i in range(len(peer_list))]
y = peer_list
plt.xlabel("Time (Minutes)")
plt.ylabel("Number of Peers")
plt.title("Number of Peers Over a 24-hour Period")
plt.plot(x, y)
plt.show()

Run the python script to show a line graph of how the total number of peers evolve over time.

$ python plot_peers.py

4: Plotting Protocol Version 2 or 3 Peers

*Open to improvements.

We can search through all the peers to find the ones we are interested in and plot the length of time of their connections to our GRIN-Node.

Create a python file called peer_protocol_2_3.py in the peer_data directory.

peer_protocol_2_3.py

import json
import matplotlib.pyplot as plt
from itertools import groupby
from pathlib import Path

# Find all peers with protocol version 2 or 3
peer_set = set()
paths = sorted(Path(__file__).parent.glob('logs/*.txt'))
for path in paths:
    with open(path, 'r') as f:
        data_dict = json.loads(f.read())
        for peer in data_dict["result"]["Ok"]:
            version = peer["version"]
            if version == 2 or version == 3:
                peer_set.add(peer["addr"])

# Check when proto 2 or 3 peers are connected for each minute
peer_minutes_list =[[peer,[]] for peer in peer_set]
paths = sorted(Path(__file__).parent.glob('logs/*.txt'))
for path in paths:
    with open(path, 'r') as f:
        data = f.read()
        for p in peer_minutes_list:
            if p[0] in data:
                p[1].append(1)
            else:
                p[1].append(0)

# Get the time index when proto 2 or 3 peers are connected
for peer in peer_minutes_list:
    peer.append([])
    for index,value in enumerate(peer[1]):
        if value == 1:
            peer[2].append(index)

# Create ranges for the plot
for peer in peer_minutes_list:
    peer.append([])
    for a,b in groupby(enumerate(peer[2]), lambda pair: pair[1] - pair[0]):
        b = list(b)
        peer[3].append((b[0][1], b[-1][1] - b[0][1] + 1))

# Create horizontal bar plot
fig, ax = plt.subplots()

num = 0
for peer in peer_minutes_list:
    ax.broken_barh([peer[3][0]], (num, 1.75), facecolors=('tab:green'))
    num += 2

ax.set_xlim(0, 1440)
ax.set_xlabel('Time (Minutes)')
ax.set_yticks([i*2+0.9 for i in range(len(peer_minutes_list))])
ax.set_yticklabels([peer[0] for peer in peer_minutes_list])
ax.grid(True)

plt.title("Uptime of Protocol Version 2 Peers Over a 24-hour Period")
plt.ylabel("Peers (IP Addresses)")
plt.show()

Run the python script to show a horizontal bar plot of the connection times of protocol version 2 or 3 peers.

$ python peer_protocol_2_3.py

Final Results

Testing Data

Total Logs: 1451 minutes

Timestamps
Start 2020-12-23 09:16:00 UTC
End 2020-12-24 09:27:06 UTC

Count of Peers Over Time

Figure_1

Here are when the deviations occurred. The rest of the time, the total number of peers was always at 8. Minute Index Total Peers
777 4
780 7
781 9
794 7
1072 7
1073 7

Connected Peers with Protocol Version 2 or 3

Peers User Agent Proto. Version Total Minutes
134.209.15.186:3414 MW/Grin 4.0.1 2 777
39.104.171.210:3414 MW/Grin 4.0.2 2 674
146.0.83.242:3414 Grin++ 1.1.4 2 377
47.90.208.51:3414 MW/Grin 4.0.1 2 656
39.106.195.31:3414 MW/Grin 3.1.0 2 2
35.181.76.219:3414 MW/Grin 4.0.0 2 673
116.62.217.224:3414 MW/Grin 4.0.0 2 294
95.217.132.23:3414 MW/Grin 4.0.0-alpha.1 2 1
45.66.11.31:3414 MW/Grin 4.0.0-beta.1 2 777
35.181.69.6:3414 MW/Grin 4.0.0 2 777
157.230.142.11:3414 MW/Grin 4.0.2 2 657

Figure_2


Peers with protocol version 2 stay connected for a reliable amount of time on average. No peers with protocol version 3 were found during this testing period. The number of total connected peers was stable at 8 with only a few deviations over a 24-hour period.

goyle commented 3 years ago

All peer data logs used in testing: peer_data_logs.zip

marekyggdrasil commented 3 years ago

Excellent work @goyle ! Thank you for this !