bisq-network / roles

@bisq-network contributor roles
https://bisq.wiki/Roles
15 stars 16 forks source link

Seednode Operator #15

Open cbeams opened 6 years ago

cbeams commented 6 years ago

This role is responsible for operating one or more Bisq seednodes.

See: btc_mainnet.seednodes


Docs: none, other than the above Team: @bisq-network/seednode-operators

ManfredKarrer commented 6 years ago

2018.04 report

Running 6 Bitcoin and 2 LTC instances. Digital ocean updates their servers frequently with security patches and that causes restarts which kills the seed node (no crone job for autostart). I am following the old email notifications and got alerted soon to start the seed node in such cases.

ManfredKarrer commented 6 years ago

2018.05 report

Running 6 Bitcoin and 2 LTC instances.

Emzy commented 6 years ago

2018.05 report

Running 1 Bitcoin instance.

bisq-network/compensation#76

mrosseel commented 6 years ago

2018.05 report

Running 1 Bitcoin instance hosting: Linode in docker container

bisq-network/compensation#80

mrosseel commented 6 years ago

2018.06 report

Running 1 seednode instance hosting: Linode in docker container

bisq-network/compensation#83

Emzy commented 6 years ago

2018.06 report

Running 1 Bitcoin instance

bisq-network/compensation#88

ManfredKarrer commented 6 years ago

2018.06 report

Running 6 seednode instance Updated to new Version bisq-network/compensation#92

ManfredKarrer commented 6 years ago

@Emzy @mrosseel You mixed that role up with the bitcoin operator role...

cbeams commented 6 years ago

I've updated the description of this role issue and updated the @bisq-network/seednode-operators team to reflect current status.

ManfredKarrer commented 5 years ago

2018.07 report

Running 6 seednode instance.

/cc bisq-network/compensation#93

Emzy commented 5 years ago

2018.07 report

Running 1 Bitcoin seednode instance

/cc bisq-network/compensation#100

mrosseel commented 5 years ago

2018.07 report

Running 1 seednode instance hosting: Linode in docker container

After last month's docker fixes, no further issues were detected. Nothing to report

bisq-network/compensation#105

Emzy commented 5 years ago

018.08 report

Running 1 Bitcoin seednode instance hosting: Hetzner VM on my dedicated server

/cc bisq-network/compensation#111

ManfredKarrer commented 5 years ago

2018.08 report

Running 6 seednode instance.

/cc bisq-network/compensation#112

mrosseel commented 5 years ago

2018.08 report

Running 1 seednode instance hosting: Linode in docker container

Nothing to report

bisq-network/compensation#116

ManfredKarrer commented 5 years ago

2018.09 report

Running 6 seednode instance.

/cc bisq-network/compensation#125

Emzy commented 5 years ago

018.09 report

Running 1 Bitcoin seednode instance hosting: Hetzner VM on my dedicated server

/cc bisq-network/compensation#136

mrosseel commented 5 years ago

2018.09 report

Running 1 seednode instance hosting: Linode in docker container

Nothing to report

bisq-network/compensation#141

ManfredKarrer commented 5 years ago

2018.10 report

Running 6 seednode instance.

/cc bisq-network/compensation#155

mrosseel commented 5 years ago

2018.10 report

Running 1 seednode instance hosting: Linode in docker container

Nothing to report

bisq-network/compensation#157

Emzy commented 5 years ago

018.10 report

Running 1 Bitcoin seednode instance hosting: Hetzner VM on my dedicated server

/cc bisq-network/compensation#163

Emzy commented 5 years ago

2018.11 report

Running 1 Bitcoin seednode instance hosting: Hetzner VM on my dedicated server

/cc bisq-network/compensation#175

ManfredKarrer commented 5 years ago

2018.11 report

Running 6 seednode instance. Just started 2 new ones for testnet (DAO).

/cc bisq-network/compensation#180

mrosseel commented 5 years ago

2018.11 report

Running 1 seednode instance hosting: Linode in docker container

Nothing to report

bisq-network/compensation#181

ManfredKarrer commented 5 years ago

2018.11 report

Running 6 mainnet nodes and 2 testnet nodes (DAO).

/cc bisq-network/compensation#189

Emzy commented 5 years ago

2018.12 report

Running 1 Bitcoin seednode instance hosting: Hetzner VM on my dedicated server

/cc bisq-network/compensation#191

ManfredKarrer commented 5 years ago

We had a severe incident yesterday with all seed nodes.

Reason was that I updated the --maxMemory program argument from 512 to 1024 MB. My servers have 4 GB RAM and run 2 nodes each, so I thought that should be ok. But was not. It caused out of memory errors and nodes became stuck (required kill -9 to stop them).

I increased the maxMemory setting because I saw that they restarted every 2-3 hours (earlier it was about once a day). The seed nodes check the memorey they consume and if it hits the maxMemory they automatically restart. That is a work-around for a potential memory leak which seems to occure only on Linux (and/or seed nodes). At least on OSX with normal Bisq app I never could reproduce it, i could even run the app with about 100 connections, which never worked on my Linux boxes. So I assume its some OS setting causing it. We researched a bit in the past but never found out what is the real reason (never dedicated enough effort - we should prioritize that old issue in the near future).

The situation was discovered late night as a user posted a GH issue that he has no arbitrators, checking the monitor page alerted me as all nodes have been without data basically and most not responsive. From stats on my hoster I saw that the situation somewhere in the last 12-24 hours.

The 2 nodes from Mike and Stephan have been responsive (as they did not change anything) but also were missing data (as they restart every few hours as well and therefor connect to other seeds to gather the data - as the other seeds lost data over time they also became corrupted).

It was a lesson that it is not a good idea to change too much and change all seeds at the same time! Good thing is that it could recover at the end quite quickly and the network is quite resilient even in case all seeds fail (as it was the case more or less).

To recover I started locally one seed and removed all other seed addresses (in the code), so it connected after a while to any persisted peer (normal Bisq apps). From those it got the data which are present in the network and then I used that seed as dedicated seed (using --seedNodes) for the other seeds to start up again. So my seeds all become filled with data again. Mikes and Stephans seeds needed a few hours until they got up to date again once they restarted (so the too fast restart interval was a benefit here).

I updated my servers to 8 GB (4GB / node) and will test now more carfully how far I can go with the --maxConnections and --maxMemory settings. Currently I run 4 nodes with --maxConnections=30 --maxMemory=1024 and 2 with --maxConnections=25 --maxMemory=750. Stephan told me he had anyway already 4 GB and --maxConnections=30 --maxMemory=1024 which seems a safe setting. Mike has not responded so far, but I assume he has lower settings as his node recovered quite fast (restarted faster).

What we should do:

ghost commented 5 years ago

I reread the issue https://github.com/bisq-network/bisq/issues/599 , where a user reported also abnormal memory consumption under Ubuntu, and where I myself reported low memory consumption under Debian Stretch. @Emzy says he uses Debian Stretch with his seednode (and never reported a memory issue afaik)

So I wonder if this memory leakage issue could not be specific to Ubuntu ? (and could maybe simply be solved by running under Debian ?)

ManfredKarrer commented 5 years ago

2019.01 report

We had issues with heap memory (see above) but it is resolved now and we added more vm arguments and increased the prog argument for maxMemory.

java -XX:+UseG1GC -Xms512m -Xmx4000m -jar /root/bisq/seednode/build/libs/seednode-all.jar --maxConnections=30 --maxMemory=3000 ...

The -XX:+UseG1GC argument tells the jvm to use another garbage collector which behaves better according to @freimair

Heap memory defined in -Xmx must be about 20-30% larger than the amount at maxMemory.

Started as well 2 more seed nodes for the DAO (4 in total).

/cc bisq-network/compensation#205

Emzy commented 5 years ago

2019.01 report

Running 1 Bitcoin seednode instance hosting: Hetzner VM on my dedicated server

/cc bisq-network/compensation#212

Emzy commented 5 years ago

2019.02 report

Running 1 Bitcoin seednode instance hosting: Hetzner VM on my dedicated server

/cc bisq-network/compensation#225

ManfredKarrer commented 5 years ago

2019.02 report

Running 6 mainnet nodes and 4 DAO testnet nodes. Started to hand over 2 nodes to @freimair.

/cc bisq-network/compensation#227

mrosseel commented 5 years ago

2018.12 - 2019.02 report

Running 1 seednode instance hosting: Linode in docker container

Did some investigation after monitoring failures seen by manfred. Updated parameters so that it's running better now, in Grafana it looked like it might still be restarting too much but it was another node with similar color, restarts were 2/3 times a day which is 'normal'. The recently discovered memory leak in seednodes might fix the restarts altogether. TODO: after seednode refactoring is done, I'll make a new docker image for the seednodes and upgrade to 0.9.3

bisq-network/compensation#220

Emzy commented 5 years ago

2019.03 report

Running 1 Bitcoin seednode instance hosting: Hetzner VM on my dedicated server

/cc bisq-network/compensation#246

ManfredKarrer commented 5 years ago

2019.03 report

/cc bisq-network/compensation#252

freimair commented 5 years ago

2019.03 report

ManfredKarrer commented 5 years ago

We all should get the DAO setup ready now. Here is a summary of the instructions:

Checkout https://github.com/ManfredKarrer/bisq/tree/rc_v1.0.0 and build from that. There is a new mainnet genesis tx so that can be used for a testrun as DAO full node. Do not try to run as DAO full node with the master branch as the genesis tx there is very old and will take long time for sync.

Here are my conf files for btc core: bitcoin.conf:

datadir=.....
maxconnections=800
timeout=30000
listen=0
server=1
txindex=1
rpcallowip=127.0.0.1
rpcuser=....
rpcpassword=....
blocknotify=bash /root/.bitcoin/blocknotify %s

datadir, rpcuser, rpcpassword, blocknotify need to be edited by yourself. maxconnections and timeout i took from my btc nodes. We do not run it as listening node to safe resources. We might change that later.

blocknotify file:

#!/bin/bash
echo $1 | nc -w 1 127.0.0.1 5110

I use a small start script for the seed node nohup sh loop.sh &

loop.sh:

#!/bin/bash

java -XX:+UseG1GC \
-Xms512m \
-Xmx2000m \
-jar bisq/seednode/build/libs/seednode-all.jar \
--maxMemory=1200 \
--maxConnections=30 \
--baseCurrencyNetwork=BTC_MAINNET \
--appName=seed \
--nodePort=8000 \
--daoActivated=true \
--fullDaoNode=true \
--rpcPort=8332 \
--rpcUser=... \
--rpcPassword=... \
--rpcBlockNotificationPort=5110 \
>/dev/null 2>error.log

Be sure the rpcBlockNotificationPort is matching the entry in the blocknotify file. Please stick with the memory settings as above as I tested a lot and those seem to work well.

Be sure to have 4 GB RAM, its needed. 300 GB space is needed as well as with txindex the current blockchain is about 260 GB and in 4-6 months we reach 300.

If your servers setting is not completely trivial please add a small readme file in case I need to access so I can easily find my way how to stop, restart or edit config files and program arguments in case you are not available.

ManfredKarrer commented 5 years ago

Ah one note: If you run it as service be sure to have the log settings set correctly so you don't run out of diskspace! Add to the readme where the log file is and how to access it if its not in standard data directory...

Emzy commented 5 years ago

Here are my conf files for btc core: bitcoin.conf:

datadir=.....
maxconnections=800
timeout=30000
listen=0
server=1
txindex=1
#rpcallowip=127.0.0.1
rpcuser=....
rpcpassword=....
blocknotify=bash /root/.bitcoin/blocknotify %s

Please don't use "rpcallowip=127.0.0.1" it will open the RPC port for the world: "# netstat -lpntu With "rpcallowip=127.0.0.1" tcp6 0 0 :::8332 :::* LISTEN 902/bitcoind Without it the port will only open on loclhost "::1": tcp6 0 0 ::1:8332 :::* LISTEN 4788/bitcoind

devinbileck commented 5 years ago

2019.04 report

I have setup and transferred ownership of the 3f3cu2yw7u457ztq.onion seednode from @ManfredKarrer (https://github.com/bisq-network/bisq/pull/2803).

Minimal expenses incurred for this month as I did not setup the node until the very end of the month.

/cc bisq-network/compensation#270

Emzy commented 5 years ago

Cycle1 report

Running 2 Bitcoin seednode instances hosting: Hetzner VM on my dedicated server and a second dedicated server

Moved one Seednode because it now needs a bitcoind (Blockchain), so more resources needed. Setup und test of a second seednode I took over from @ManfredKarrer

/cc bisq-network/compensation#279

mrosseel commented 5 years ago

2019.03 & 2019.04 (Cycle 1) report

Running 1 seednode instance in march and 2 seednodes+full nodes in april Due to a switch of provider costs remained the same for 1 seednode, even with the extra storage.

bisq-network/compensation#281

devinbileck commented 5 years ago

2019.05 report

I have setup and transferred ownership of the fl3mmribyxgrv63c.onion seednode from @ManfredKarrer (https://github.com/bisq-network/bisq/commit/a8ed773e8427fa6ea5658ec801bf9c0959fe3f66#diff-04f4a7f86dda2770277614e21ae570a3).

I am now running 2 seednodes on separate hosting providers, costing $50 USD / month each to satisfy the necessary requirements (2 CPU, 4GB RAM, 300GB storage).

No issues to report this month.

Both were updated to latest master on May 27 with this commit.

/cc bisq-network/compensation#295

Emzy commented 5 years ago

Cycle2 report

Running 2 Bitcoin seednode instances hosting: Hetzner VM on my dedicated server and a second dedicated server

/cc bisq-network/compensation#298

devinbileck commented 5 years ago

Cycle 3 report

Summary

Running 2 seed nodes on mainnet.

Running 1 seed node on testnet.

This month I deployed a seed node on testnet since the old testnet seed nodes are no longer maintained and testnet is still a useful testing environment. https://github.com/bisq-network/bisq/pull/2920

Issues Encountered

Issue 1: On July 4th, fl3mmribyxgrv63c was delivering outdated blocks as head blocks. See the monitor for more details. The issue appeared to be caused by failing to receive notifications from bitcoind (see log snippet below). No idea why at this point. I ended up restarting the seed node and it resoved the issue.

bisq.core.dao.node.full.RpcException: NotificationHandlerException(super=com.neemre.btcdcli4j.daemon.NotificationHandlerException: Error #1004004: The operation failed due to an unknown IO exception., error=Errors(code=1004004, message=The operation failed due to an unknown IO exception.), code=1004004)
    at bisq.core.dao.node.full.RpcService.lambda$setup$0(RpcService.java:138)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: NotificationHandlerException(super=com.neemre.btcdcli4j.daemon.NotificationHandlerException: Error #1004004: The operation failed due to an unknown IO exception., error=Errors(code=1004004, message=The operation failed due to an unknown IO exception.), code=1004004)
    at com.neemre.btcdcli4j.daemon.notification.worker.NotificationWorker.call(NotificationWorker.java:64)
    at com.neemre.btcdcli4j.daemon.notification.worker.NotificationWorker.call(NotificationWorker.java:22)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
    ... 3 more
Caused by: java.net.SocketException: Connection reset
    at java.base/java.net.SocketInputStream.read(SocketInputStream.java:210)
    at java.base/java.net.SocketInputStream.read(SocketInputStream.java:141)
    at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
    at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
    at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
    at java.base/java.io.BufferedReader.fill(BufferedReader.java:161)
    at java.base/java.io.BufferedReader.readLine(BufferedReader.java:326)
    at java.base/java.io.BufferedReader.readLine(BufferedReader.java:392)
    at com.neemre.btcdcli4j.daemon.notification.worker.NotificationWorker.call(NotificationWorker.java:46)
    ... 7 more

Issue 2: @alexej996 was encountering issues with his seed node. While looking at the logs, it seemed to be hitting the memory limit and restarting:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We are over our memory limit (1200) and trigger a restart. usedMemory: 1275 MB. freeMemory: 295 MB
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

And frequently going over 80%:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We are over 80% of our memory limit (960) and call the GC. usedMemory: 1156 MB. freeMemory: 414 MB
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

There is a PR that was merged recently that may or may not help with memory usage: https://github.com/bisq-network/bisq/pull/2501

If it doesn't help, we may require further improvements or potentially increase the memory limit for now.

Maintenance Performed

No maintenance performed this month to the mainnet seed nodes.

Expenses Incurred

Expenses incurred for the month (USD):

Total: $120

/cc bisq-network/compensation#309

Emzy commented 5 years ago

Cycle3 report

Running 2 Bitcoin seednode instances hosting: Hetzner VM on my dedicated server and a second dedicated server

/cc bisq-network/compensation#310

mrosseel commented 5 years ago

Cycle 2&3 report

Both seednodes now running stabel. Had some issues which were investigated and these were the results: When both the bitcoin fullnode (btcd) and the seednode start, there is a period in which btcd is still verifying the blockchain, i.e. it's not ready. If the seednode process talking to btcd (I'll call it btc_caller) does any requests in this period, it receives the 'RPC_IN_WARMUP' error (see https://bitcoin.stackexchange.com/questions/46662/bitcoind-error-28). This results in the btc_caller crashing, while the seed node continues operating. This results in no BSQ blocks being seen by the seednode, but for the rest operating normally. One fix would be to ignore these errors in the seednode and not crash the btc_caller.

Very curious why other operators have not noticed this behavior. I have seen similar errors reported by @alexej996 , one way to check this is to see if there's something running on port 5120. Do this also when it's running correctly, so you can compare. If nothing is running on 5120, btcd can no longer notify the seednodes if there's a new block.

bisq-network/compensation#312

Emzy commented 4 years ago

Cycle 4 report

Running 3 seednode instances hosting: Hetzner VM on my dedicated server and two dedicated servers

Setup of a 3. seednode.

/cc bisq-network/compensation#324

devinbileck commented 4 years ago

Cycle 4 report

Summary

Running 3 seed nodes on mainnet.

Running 1 seed node on testnet.

This month I took ownership of seed node jhgcy2won7xnslrb. https://github.com/bisq-network/bisq/pull/3002

Issues Encountered

Issue 1: On July 22nd, fl3mmribyxgrv63c was behind on the DAO state head. See the monitor for more details. The issue appeared to be caused by not having swap enabled on that machine and the memory usage was maxed out. To resolve it, I added swap space to the machine and restarted the seed.

Issue 2: On Aug 6, 3f3cu2yw7u457ztq was behind on the DAO state head. See the monitor for more details. The issue appeared to be a swap space issue again - it was maxed out at 512 MB. As a result, I increased swap space on all my seeds to 4 GB to ensure plenty of space.

Maintenance Performed

On Aug 6 I updated all my mainnet seed nodes to Manfred's branch to apply a hotfix.

I plan to update my seed nodes to follow the updated document from Florian to ensure a consistent setup. Once that is done, I will organize a backup operator in case I am unavailable for maintenance.

Expenses Incurred

Expenses incurred for the month (USD):

Total: $170

/cc bisq-network/compensation#326

mrosseel commented 4 years ago

Cycle 4 report

Both seednodes running stable. Updated to Manfred's latest fix branch as discussed in seednode channel.

bisq-network/compensation#331