Closed jcarovestigia closed 2 years ago
Could you please provide the exact content of the .env
file (for docker-compose) in your local configuration, as well as the full geth command, with all its options, being executed in the container? If the docker-compose.yml
file has any substantial modification provide its content too, please.
Thank you. This is one of our .env file:
#
# NODE_TYPE=general|bootnode|validator
# NODE_NAME=REG_Example_T_2_8_00
# Should be:
# REG_<partner>_T_<number_cores>_<memory>_<#node>
# NODE_BRANCH=main
NODE_TYPE=general
NODE_NAME=REG_Vestigia_T_2_8_01
NODE_BRANCH=main
docker-compose.yml:
# Compose file for a Alastria-T node
version: "3.7"
services:
alastria-node:
build: ./alastria-node
restart: unless-stopped
container_name: ${NODE_NAME}
volumes:
- "./alastria-node-data:/root/alastria"
ports:
- "21000:21000/tcp"
- "21000:21000/udp"
- "6060:6060/tcp"
# Enable connection for dApps. Only for Regular/General nodes
#
# To be used from RCP/JSON:
- "22000:22000/tcp"
#
# To be used from WebSockets:
# - "22001:22001/tcp"
#
environment:
- NODE_TYPE=${NODE_TYPE}
- NODE_NAME=${NODE_NAME}
- NODE_BRANCH=${NODE_BRANCH}
geth command:
geth --datadir /root/alastria/data --networkid 83584648538 --identity REG_Vestigia_T_2_8_01 --permissioned --cache 4196 --port 21000 --istanbul.requesttimeout 10000 --verbosity 3 --emitcheckpoints --syncmode fast --gcmode full --vmodule consensus/istanbul/core/core.go=5 --nousb --metrics --metrics.expensive --pprof --pprofaddr=0.0.0.0 --rpc --rpcaddr 0.0.0.0 --rpcport 22000 --rpccorsdomain=* --rpcvhosts=* --rpcapi admin,db,eth,debug,miner,net,shh,txpool,personal,web3,quorum,istanbul --ws --wsaddr 0.0.0.0 --wsport 22001 --wsorigins=* --wsapi admin,db,eth,debug,miner,net,shh,txpool,personal,web3,quorum,istanbul
Some partners has suggested to remove the miner
methods from the RPC and WS APIs, as that can be causing to make the node trying to mine (there is only one other documented case in which this has being ocurring). We have updated the geth.node.general.sh
file, from which you can update your local config.
After restarting the node you can also execute the following command to ensure that the node has not been ordered to mine: docker exec -it REG_Vestigia_T_2_8_01 geth --exec "miner.stop()" attach /root/alastria/data/geth.ipc
.
Some partners has suggested to remove the
miner
methods from the RPC and WS APIs, as that can be causing to make the node trying to mine (there is only one other documented case in which this has being ocurring). We have updated thegeth.node.general.sh
file, from which you can update your local config.After restarting the node you can also execute the following command to ensure that the node has not been ordered to mine:
docker exec -it REG_Vestigia_T_2_8_01 geth --exec "miner.stop()" attach /root/alastria/data/geth.ipc
.
If they remove first the API and WS miner option, they couldn't do the command "miner.stop()"
they need to start the node, do the miner.stop() command, and then, they remove "miner" in the geth.node.general.sh file and restart the docker machine.
we solved the problem with this.
Other thing that we have now different from github is that we have change the version after the sync from VER="v21.1.0" to VER="v21.10.2" in Dockerfile and it makes the ram comsumption more stable
Thank you very much. I've made the changes you suggest and both nodes give error (memory leaks) at the very moment they start to sinchronize:
REG_Vestigia_T_2_8_03 | INFO [09-28|18:51:20.208] Block synchronisation started
REG_Vestigia_T_2_8_03 | INFO [09-28|18:51:20.524] New local node record seq=15651 id=3c6cff92ea496302 ip=35.180.131.71 udp=21000 tcp=21000
REG_Vestigia_T_2_8_03 | panic: runtime error: invalid memory address or nil pointer dereference
REG_Vestigia_T_2_8_03 | [signal SIGSEGV: segmentation violation code=0x1 addr=0x1c0 pc=0xcb00b3]
REG_Vestigia_T_2_8_03 |
REG_Vestigia_T_2_8_03 | goroutine 823 [running]:
REG_Vestigia_T_2_8_03 | github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).findAncestor(0xc02b9f96c0, 0xc02cbe83c0, 0xc02db186c0, 0x0, 0x0, 0xa)
REG_Vestigia_T_2_8_03 | /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/eth/downloader/downloader.go:891 +0x14d3
REG_Vestigia_T_2_8_03 | github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).syncWithPeer(0xc02b9f96c0, 0xc02cbe83c0, 0x289386617983b2d5, 0x5587869af5860718, 0x67514978236b5b82, 0x24cda1c07ef4b2b1, 0xc02e10bda0, 0x0, 0x0)
REG_Vestigia_T_2_8_03 | /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/eth/downloader/downloader.go:457 +0x3df
REG_Vestigia_T_2_8_03 | github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).synchronise(0xc02b9f96c0, 0xc02cbc2610, 0x10, 0x289386617983b2d5, 0x5587869af5860718, 0x67514978236b5b82, 0x24cda1c07ef4b2b1, 0xc02e10bda0, 0x0, 0x0, ...)
REG_Vestigia_T_2_8_03 | /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/eth/downloader/downloader.go:425 +0x418
REG_Vestigia_T_2_8_03 | github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).Synchronise(0xc02b9f96c0, 0xc02cbc2610, 0x10, 0x289386617983b2d5, 0x5587869af5860718, 0x67514978236b5b82, 0x24cda1c07ef4b2b1, 0xc02e10bda0, 0x0, 0x6697d6, ...)
REG_Vestigia_T_2_8_03 | /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/eth/downloader/downloader.go:329 +0x8e
REG_Vestigia_T_2_8_03 | github.com/ethereum/go-ethereum/eth.(*ProtocolManager).synchronise(0xc02ba645a0, 0xc000126780)
REG_Vestigia_T_2_8_03 | /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/eth/sync.go:200 +0x2ee
REG_Vestigia_T_2_8_03 | created by github.com/ethereum/go-ethereum/eth.(*ProtocolManager).handleMsg
I'll try to reset the chain (removedb) and wait a few days for both to resynchronise.
Most node synchronization problems are related to the high memory and cpu needs of the process. The right way to sync a node from scratch, or troubleshoot when the LevelDB database becomes corrupted, is to have at least 8Gb of physical memory available, although 16Gb is recomended. In this case, it will be a cuestion of time before the synchronization is completed correctly, and a synchronization from scratch with the DLT would be very rare.
It is also recommended to sync with version 21.1.0 of GoQuorum, although it is recommended to move to version 21.10.2 once the process is finished. In this case, 4Gb of memory will be enough to keep the node stable.
Still the same with the two nodes:
REG_Vestigia_T_2_8_03 | INFO [10-02|11:39:48.728] Imported new chain segment blocks=1 txs=1 mgas=0.525 elapsed=4.927ms mgasps=106.589 number=102700192 hash=ea4f84…a4ea68 dirty=140.46MiB
REG_Vestigia_T_2_8_03 | INFO [10-02|11:39:50.357] QUORUM-CHECKPOINT name=TX-COMPLETED tx=0x72e8b2a79a9e8ee53dbfa77ceba5d8aee06710bb778ef851f1fd4dabf4416e31 time=953.33µs
REG_Vestigia_T_2_8_03 | fatal error: runtime: out of memory
REG_Vestigia_T_2_8_03 |
REG_Vestigia_T_2_8_03 | runtime stack:
REG_Vestigia_T_2_8_03 | runtime.throw(0x15c4607, 0x16)
REG_Vestigia_T_2_8_03 | /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/panic.go:1116 +0x72
REG_Vestigia_T_2_8_03 | runtime.sysMap(0xc1e0000000, 0xc0000000, 0x292f7b8)
REG_Vestigia_T_2_8_03 | /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/mem_linux.go:169 +0xc6
REG_Vestigia_T_2_8_03 | runtime.(*mheap).sysAlloc(0x29134a0, 0xbe000000, 0x441e37, 0x29134a8)
REG_Vestigia_T_2_8_03 | /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/malloc.go:727 +0x1e5
REG_Vestigia_T_2_8_03 | runtime.(*mheap).grow(0x29134a0, 0x5eed5, 0x0)
REG_Vestigia_T_2_8_03 | /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/mheap.go:1344 +0x85
REG_Vestigia_T_2_8_03 | runtime.(*mheap).allocSpan(0x29134a0, 0x5eed5, 0x521e2c7010100, 0x292f7c8, 0x200000002)
REG_Vestigia_T_2_8_03 | /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/mheap.go:1160 +0x6b6
REG_Vestigia_T_2_8_03 | runtime.(*mheap).alloc.func1()
REG_Vestigia_T_2_8_03 | /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/mheap.go:907 +0x65
REG_Vestigia_T_2_8_03 | runtime.(*mheap).alloc(0x29134a0, 0x5eed5, 0xc000500101, 0xc112533080)
REG_Vestigia_T_2_8_03 | /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/mheap.go:901 +0x85
REG_Vestigia_T_2_8_03 | runtime.largeAlloc(0xbdda8400, 0x101, 0x0)
REG_Vestigia_T_2_8_03 | /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/malloc.go:1177 +0x92
REG_Vestigia_T_2_8_03 | runtime.mallocgc.func1()
REG_Vestigia_T_2_8_03 | /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/malloc.go:1071 +0x46
REG_Vestigia_T_2_8_03 | runtime.systemstack(0x7f4d38000020)
REG_Vestigia_T_2_8_03 | /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/asm_amd64.s:370 +0x66
REG_Vestigia_T_2_8_03 | runtime.mstart()
REG_Vestigia_T_2_8_03 | /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/proc.go:1116
REG_Vestigia_T_2_8_03 |
REG_Vestigia_T_2_8_03 | goroutine 65 [running]:
REG_Vestigia_T_2_8_03 | runtime.systemstack_switch()
REG_Vestigia_T_2_8_03 | /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/asm_amd64.s:330 fp=0xc000da18c0 sp=0xc000da18b8 pc=0x484680
REG_Vestigia_T_2_8_03 | runtime.mallocgc(0xbdda8400, 0x1501360, 0xc026557e01, 0x521e2c720709d)
REG_Vestigia_T_2_8_03 | /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/malloc.go:1070 +0x938 fp=0xc000da1960 sp=0xc000da18c0 pc=0x423bd8
REG_Vestigia_T_2_8_03 | runtime.makeslice(0x1501360, 0x0, 0x5eed420, 0x4cb547dd5c91e2ad)
REG_Vestigia_T_2_8_03 | /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/slice.go:98 +0x6c fp=0xc000da1990 sp=0xc000da1960 pc=0x46510c
REG_Vestigia_T_2_8_03 | github.com/ethereum/go-ethereum/core/rawdb.(*freezer).freeze(0xc000120740, 0x186b300, 0xc00009a800)
REG_Vestigia_T_2_8_03 | /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/core/rawdb/freezer.go:296 +0x4ce fp=0xc000da1fc8 sp=0xc000da1990 pc=0x91894e
REG_Vestigia_T_2_8_03 | runtime.goexit()
REG_Vestigia_T_2_8_03 | /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc000da1fd0 sp=0xc000da1fc8 pc=0x486461
REG_Vestigia_T_2_8_03 | created by github.com/ethereum/go-ethereum/core/rawdb.NewDatabaseWithFreezer
REG_Vestigia_T_2_8_03 | /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/core/rawdb/database.go:176 +0x1d2
REG_Vestigia_T_2_8_03 |
REG_Vestigia_T_2_8_03 | goroutine 1 [chan receive, 4 minutes]:
REG_Vestigia_T_2_8_03 | github.com/ethereum/go-ethereum/node.(*Node).Wait(0xc00069cc80)
REG_Vestigia_T_2_8_03 | /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/node/node.go:554 +0x7c
REG_Vestigia_T_2_8_03 | main.geth(0xc000170000, 0x0, 0x0)
REG_Vestigia_T_2_8_03 | /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/cmd/geth/main.go:343 +0x135
REG_Vestigia_T_2_8_03 | github.com/ethereum/go-ethereum/vendor/gopkg.in/urfave/cli%2ev1.HandleAction(0x136c360, 0x16aed88, 0xc000170000, 0xc000327a40, 0x0)
REG_Vestigia_T_2_8_03 | /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/vendor/gopkg.in/urfave/cli.v1/app.go:490 +0x82
REG_Vestigia_T_2_8_03 | github.com/ethereum/go-ethereum/vendor/gopkg.in/urfave/cli%2ev1.(*App).Run(0xc0005401a0, 0xc000132000, 0x2d, 0x30, 0x0, 0x0)
REG_Vestigia_T_2_8_03 | /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/vendor/gopkg.in/urfave/cli.v1/app.go:264 +0x5f5
REG_Vestigia_T_2_8_03 | main.main()
REG_Vestigia_T_2_8_03 | /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/cmd/geth/main.go:280 +0x55
REG_Vestigia_T_2_8_03 |
REG_Vestigia_T_2_8_03 | goroutine 19 [chan receive]:
REG_Vestigia_T_2_8_03 | github.com/ethereum/go-ethereum/metrics.(*meterArbiter).tick(0x28da860)
REG_Vestigia_T_2_8_03 | /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/metrics/meter.go:289 +0x7d
REG_Vestigia_T_2_8_03 | created by github.com/ethereum/go-ethereum/metrics.NewMeter
REG_Vestigia_T_2_8_03 | /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/metrics/meter.go:55 +0x11a
REG_Vestigia_T_2_8_03 |
etc...
I'll make a try with 16GB (nodes have 8GB) How should I change from GoQuorum 21.1.0 to 21.10.2. Is there a procedure?
To make the change of GoQuorum version you must continue this steps: https://github.com/alastria/alastria-node-quorum/tree/upgrade-branch/GoQuorum-21.10.2#upgrading-node-to-goquorum-version-21102
If you cannot recover the LevelDB (... and the truth is that the errors you have shows that it will be hard) you will have to perform a new DLT synchronization.
You can use these commands, and please, use your own directory structure:
# make a backup of your private key
# make a backup of your private key
$ ./cp nodekey /root/nodekey_backup
# make a backup of your private key
# make a backup of your private key
$ ./geth --datadir _dir_ removedb
$ ./geth --datadir _dir_ init ./genesis.json
$ cp /root/nodekey_backup keystore/nodekey
# start geth, or restart the container...
And start over the sync process.
PD. Another way: seems that you have other running nodes! You can use a copy of the datadir
directory (excep nodekey
file) and use it.
You need to keep in mind:
rsync
PD. No... geth import
and geth export
seems not to work, at least in my nodes
For the record we're having a similar issue here with REG_Tribalyte_T_6_5_00
, which has 5GB RAM + 9GB swap. As described in this issue, geth
crashes from time to time due to out of memory errors when doing a full sync.
Also sometimes it doesn't crash but the log is full of messages like:
VM returned with error err="evm: execution reverted"
and geth attach alastria/data/geth.ipc
fails with
Fatal: Failed to start the JavaScript console: api modules: context deadline exceeded
I can confirm the solution is:
Thank-you all for your help.
After upgrade to goQuorum 21.10.2, the node stops trying to mine.
The only issue that remains is that the node isn't list in Grafana.
Indeed, we upgraded to 21.10.2 after synchronizing and now the node seems to be stable (2 days up with no issues) and the memory consumption is much smaller
I can confirm the solution is:
1. Install the node as described in [https://github.com/alastria/alastria-node-quorum#1-installation](url) 2. Synchronise it 3. Upgrade goQuorum following [https://github.com/alastria/alastria-node-quorum/tree/upgrade-branch/GoQuorum-21.10.2#upgrading-node-to-goquorum-version-21102](url)
Thank-you all for your help.
After upgrade to goQuorum 21.10.2, the node stops trying to mine.
The only issue that remains is that the node isn't list in Grafana.
To appear in Grafana, be sure that the port 6060 is in Dockerfile yaml and it is forwarded in the router for docker hosting machine
Tenemos problemas en dos nodos: uno trabajaba normalmente cuando lo detuvimos para aplicar el procedimiento de este repositorio para que fuese monitorizado por Grafana. Al no poder resolver el problema, creamos una nueva máquina virtual de cero y volvimos a seguir el procedimiento, llegando al mismo error.
Toman progresivamente toda la memoria disponible (6GB y pico), salvo los últimos 150KB, están así un tiempo y se “reinician” (libera la memoria consumida, que cae a 600K, no responde adecuadamente a peticiones rpc, se desincroniza a veces, otras no, y recomienza el ciclo). Todo esto lo hace sin mi intervención. El ciclo se repite indefinidamente. En la práctica, observamos periodos de respuesta muy lenta a peticiones rpc remotas de ejecución de métodos de contratos (prefirmadas) o fracaso de estas peticiones con error. Estas condiciones nos impiden trabajar con los nodos.
Tiene que haber más nodos con este fallo. No es demasiado evidente, si no monitorizas el consumo de memoria de geth.
En el momento del “reinicio” registra esto:
Nótense los intentos de minado. No sé por qué lo hace. No tiene orden de minado.