Nodos en un ciclo de saturación de memoria y resincronización

jcarovestigia commented 2 years ago

Tenemos problemas en dos nodos: uno trabajaba normalmente cuando lo detuvimos para aplicar el procedimiento de este repositorio para que fuese monitorizado por Grafana. Al no poder resolver el problema, creamos una nueva máquina virtual de cero y volvimos a seguir el procedimiento, llegando al mismo error.

Toman progresivamente toda la memoria disponible (6GB y pico), salvo los últimos 150KB, están así un tiempo y se “reinician” (libera la memoria consumida, que cae a 600K, no responde adecuadamente a peticiones rpc, se desincroniza a veces, otras no, y recomienza el ciclo). Todo esto lo hace sin mi intervención. El ciclo se repite indefinidamente. En la práctica, observamos periodos de respuesta muy lenta a peticiones rpc remotas de ejecución de métodos de contratos (prefirmadas) o fracaso de estas peticiones con error. Estas condiciones nos impiden trabajar con los nodos.

Tiene que haber más nodos con este fallo. No es demasiado evidente, si no monitorizas el consumo de memoria de geth.

En el momento del “reinicio” registra esto:

REG_Vestigia_T_2_8_01 | INFO [09-08|20:20:57.609] Imported new chain segment               blocks=1  txs=0  mgas=0.000 elapsed=1.875ms   mgasps=0.000    number=101862614 hash=cb94aa…af5b23 dirty=4.38MiB
REG_Vestigia_T_2_8_01 | TRACE[09-08|20:20:57.609] Catch up latest proposal                 address=0x9CA7A998a78B4ADF2574b0C5eC73b26d4c5052DD old_round=0  old_seq=101862614 number=101862614 hash=cb94aa…af5b23
REG_Vestigia_T_2_8_01 | INFO [09-08|20:20:57.609] Commit new mining work                   number=101862615 sealhash=cb0a34…c56756 uncles=0 txs=0  gas=0       fees=0 elapsed=338.303µs
REG_Vestigia_T_2_8_01 | DEBUG[09-08|20:20:57.609] New round                                address=0x9CA7A998a78B4ADF2574b0C5eC73b26d4c5052DD old_round=0  old_seq=101862614 old_proposer=0x21612232aae5202BD339E6F121cB814679BE5b11 new_round=0 new_seq=101862615 new_proposer=0x72FDfAb893F10fA40d1e843CFc6cB8e0C2c17E10 valSet="[0x21612232aae5202BD339E6F121cB814679BE5b11 0x6980A7683a1936197AD7Bf6CB0C26F1FF4151905 0x72FDfAb893F10fA40d1e843CFc6cB8e0C2c17E10 0x7414c1B34e38087A9c045F46549006e70Bb00fc3 0xE1f33A9AF3abC7cE84f6a73c6ca62181BE08378E 0xa9472bD1cAD83aF9285cBc4CDD5Ffac484E886c2 0xb87dC349944CC47474775DDe627A8a171fC94532]" size=7 IsProposer=false
REG_Vestigia_T_2_8_01 | WARN [09-08|20:20:57.609] Block sealing failed                     err=unauthorized
******************* FALLO *********************************************************************************
REG_Vestigia_T_2_8_01 | INFO [00-00|00:00:00.000|entrypoint.sh:51] Getting boot-nodes.json nodes...
REG_Vestigia_T_2_8_01 | INFO [00-00|00:00:00.000|entrypoint.sh:51] Getting validator-nodes.json nodes...
REG_Vestigia_T_2_8_01 | INFO [00-00|00:00:00.000|entrypoint.sh:51] Getting regular-nodes.json nodes...
REG_Vestigia_T_2_8_01 | INFO [00-00|00:00:00.000|entrypoint.sh:54] ... nodes recibed
REG_Vestigia_T_2_8_01 | INFO [00-00|00:00:00.000|entrypoint.sh:56] Parsing correct databases
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:02.488] Starting pprof server                    addr=http://0.0.0.0:6060/debug/pprof
REG_Vestigia_T_2_8_01 | WARN [09-08|20:21:02.490] Sanitizing cache to Go's GC limits       provided=4196 updated=2659
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:02.490] Enabling metrics collection
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:02.494] Maximum peer count                       ETH=50 LES=0 total=50
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:02.495] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
REG_Vestigia_T_2_8_01 | WARN [09-08|20:21:02.496] permission-config.json file is missing. Smart-contract-based permission service will be disabled error="stat /root/alastria/data/permission-config.json: no such file or directory"
REG_Vestigia_T_2_8_01 | WARN [09-08|20:21:02.496] Found deprecated node list file /root/alastria/data/static-nodes.json, please use the TOML config file instead.
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:02.497] Starting peer-to-peer node               instance=Geth/REG_Vestigia_T_2_8_01/v1.9.7-stable-a21e1d44(quorum-v21.1.0)/linux-amd64/go1.15.5
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:02.497] Allocated trie memory caches             clean=664.00MiB dirty=664.00MiB
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:02.497] Allocated cache and file handles         database=/root/alastria/data/geth/chaindata cache=1.30GiB handles=524288
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:02.864] Opened ancient database                  database=/root/alastria/data/geth/chaindata/ancient
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:02.929] Initialised chain configuration          config="{ChainID: 83584648538 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0 IsQuorum: true Constantinople: <nil> TransactionSizeLimit: 64 MaxCodeSize: 0 Petersburg: <nil> Istanbul: <nil> PrivacyEnhancements: <nil> Engine: istanbul}"
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:02.943] Initialising Ethereum protocol           name=istanbul versions="[99 64]" network=83584648538 dbversion=7
REG_Vestigia_T_2_8_01 | WARN [09-08|20:21:03.059] Head state missing, repairing chain      number=101862614 hash=cb94aa…af5b23
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:03.151] Deep froze chain segment                 blocks=35 elapsed=278.271ms number=98700373  hash=80b5ef…add5a4
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:03.551] Rewound blockchain to past state         number=101862164 hash=49eb99…14d01b
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:03.552] Loaded most recent local header          number=101862614 hash=cb94aa…af5b23 td=101862615 age=6s
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:03.553] Loaded most recent local full block      number=101862164 hash=49eb99…14d01b td=101862165 age=12m56s
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:03.553] Loaded most recent local fast block      number=101862614 hash=cb94aa…af5b23 td=101862615 age=6s
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:03.636] Setting new local account                address=0x2C9b9d4f8cA354C6549612fB91843B6C7F5E5c34
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:03.637] Loaded local transaction journal         transactions=7 dropped=0
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:03.637] Regenerated local transaction journal    transactions=7 accounts=1
REG_Vestigia_T_2_8_01 | WARN [09-08|20:21:03.637] Switch sync mode from fast sync to full sync
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:03.870] New local node record                    seq=3028 id=992af8b7acd1eeac ip=127.0.0.1 udp=21000 tcp=21000
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:03.872] Started P2P networking                   self=enode://556df1958a3d68a0d8c06dd3bc92baae06fdaf71ba15ef3bfae947465ee100f5a6dfcb531f8b327ac49c9c71c77da07f1ab795a35368d6e9ac17e50805aa66ed@127.0.0.1:21000
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:03.880] IPC endpoint opened                      url=/root/alastria/data/geth.ipc
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:03.880] Security Plugin is not enabled
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:03.881] Security: TLS not enabled                endpoint=0.0.0.0:22000 reason="no TLSConfigurationSource found"
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:03.881] http endpoint opened                     url=http://0.0.0.0:22000         cors= vhosts=*
REG_Vestigia_T_2_8_01 | WARN [09-08|20:21:03.881] permission-config.json file is missing. Smart-contract-based permission service will be disabled error="stat /root/alastria/data/permission-config.json: no such file or directory"
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:04.180] QUORUM-CHECKPOINT                        name=TX-COMPLETED tx=0xec99bdbddde52fd82cdba35eb0aff5e59084decb6272764953d66c9836ee8a63 time=28.912361ms
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:04.186] QUORUM-CHECKPOINT                        name=TX-COMPLETED tx=0xf1f20d4c1650456d441bedc71037d0c8a809bcafbe0cee8fc83d7f86cff9766d time=6.394358ms
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:04.192] QUORUM-CHECKPOINT                        name=TX-COMPLETED tx=0x55b9758d39c995a5fd9e8721bb2202229c3903cae207b56d3e9ec30bf70d3655 time=6.191456ms
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:04.199] QUORUM-CHECKPOINT                        name=TX-COMPLETED tx=0xa0f343fee0eeee5e7bd03c0860beee044d1784dcec45f40719847110f97a0e52 time=6.712661ms
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:04.210] QUORUM-CHECKPOINT                        name=TX-COMPLETED tx=0xf6d2b057bc38488597382161d08c41f15acf0a5a5e2a5c90f1e9fcf3e93243ee time=10.462494ms
REG_Vestigia_T_2_8_01 | INFO [09-08|20:21:04.212] QUORUM-CHECKPOINT                        name=TX-COMPLETED tx=0xdf27bfcb20416c08f5c83fac3efbe3a217a78b115fca2dfe199b5f3968e3cb68 time=2.443122ms

Nótense los intentos de minado. No sé por qué lo hace. No tiene orden de minado.

irzinfante commented 2 years ago

Could you please provide the exact content of the .env file (for docker-compose) in your local configuration, as well as the full geth command, with all its options, being executed in the container? If the docker-compose.yml file has any substantial modification provide its content too, please.

jcarovestigia commented 2 years ago

Thank you. This is one of our .env file:

#

# NODE_TYPE=general|bootnode|validator
# NODE_NAME=REG_Example_T_2_8_00
#       Should be:
#       REG_<partner>_T_<number_cores>_<memory>_<#node>
# NODE_BRANCH=main

NODE_TYPE=general
NODE_NAME=REG_Vestigia_T_2_8_01
NODE_BRANCH=main

docker-compose.yml:

# Compose file for a Alastria-T node

version: "3.7"
services:
  alastria-node:
    build: ./alastria-node
    restart: unless-stopped
    container_name: ${NODE_NAME}
    volumes:
      - "./alastria-node-data:/root/alastria"
    ports:
      - "21000:21000/tcp"
      - "21000:21000/udp"
      - "6060:6060/tcp"
# Enable connection for dApps. Only for Regular/General nodes
#
# To be used from RCP/JSON:
      - "22000:22000/tcp"
#
# To be used from WebSockets:
#     - "22001:22001/tcp"
#
    environment:
      - NODE_TYPE=${NODE_TYPE}
      - NODE_NAME=${NODE_NAME}
      - NODE_BRANCH=${NODE_BRANCH}

geth command:

geth --datadir /root/alastria/data --networkid 83584648538 --identity REG_Vestigia_T_2_8_01 --permissioned --cache 4196 --port 21000 --istanbul.requesttimeout 10000 --verbosity 3 --emitcheckpoints --syncmode fast --gcmode full --vmodule consensus/istanbul/core/core.go=5 --nousb --metrics --metrics.expensive --pprof --pprofaddr=0.0.0.0 --rpc --rpcaddr 0.0.0.0 --rpcport 22000 --rpccorsdomain=* --rpcvhosts=* --rpcapi admin,db,eth,debug,miner,net,shh,txpool,personal,web3,quorum,istanbul --ws --wsaddr 0.0.0.0 --wsport 22001 --wsorigins=* --wsapi admin,db,eth,debug,miner,net,shh,txpool,personal,web3,quorum,istanbul

irzinfante commented 2 years ago

Some partners has suggested to remove the miner methods from the RPC and WS APIs, as that can be causing to make the node trying to mine (there is only one other documented case in which this has being ocurring). We have updated the geth.node.general.sh file, from which you can update your local config.

After restarting the node you can also execute the following command to ensure that the node has not been ordered to mine: docker exec -it REG_Vestigia_T_2_8_01 geth --exec "miner.stop()" attach /root/alastria/data/geth.ipc.

EuskoinSmartEconomy commented 2 years ago

Some partners has suggested to remove the miner methods from the RPC and WS APIs, as that can be causing to make the node trying to mine (there is only one other documented case in which this has being ocurring). We have updated the geth.node.general.sh file, from which you can update your local config.

After restarting the node you can also execute the following command to ensure that the node has not been ordered to mine: docker exec -it REG_Vestigia_T_2_8_01 geth --exec "miner.stop()" attach /root/alastria/data/geth.ipc.

If they remove first the API and WS miner option, they couldn't do the command "miner.stop()"

they need to start the node, do the miner.stop() command, and then, they remove "miner" in the geth.node.general.sh file and restart the docker machine.

we solved the problem with this.

Other thing that we have now different from github is that we have change the version after the sync from VER="v21.1.0" to VER="v21.10.2" in Dockerfile and it makes the ram comsumption more stable

jcarovestigia commented 2 years ago

Thank you very much. I've made the changes you suggest and both nodes give error (memory leaks) at the very moment they start to sinchronize:

REG_Vestigia_T_2_8_03 | INFO [09-28|18:51:20.208] Block synchronisation started
REG_Vestigia_T_2_8_03 | INFO [09-28|18:51:20.524] New local node record                    seq=15651 id=3c6cff92ea496302 ip=35.180.131.71 udp=21000 tcp=21000
REG_Vestigia_T_2_8_03 | panic: runtime error: invalid memory address or nil pointer dereference
REG_Vestigia_T_2_8_03 | [signal SIGSEGV: segmentation violation code=0x1 addr=0x1c0 pc=0xcb00b3]
REG_Vestigia_T_2_8_03 |
REG_Vestigia_T_2_8_03 | goroutine 823 [running]:
REG_Vestigia_T_2_8_03 | github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).findAncestor(0xc02b9f96c0, 0xc02cbe83c0, 0xc02db186c0, 0x0, 0x0, 0xa)
REG_Vestigia_T_2_8_03 |         /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/eth/downloader/downloader.go:891 +0x14d3
REG_Vestigia_T_2_8_03 | github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).syncWithPeer(0xc02b9f96c0, 0xc02cbe83c0, 0x289386617983b2d5, 0x5587869af5860718, 0x67514978236b5b82, 0x24cda1c07ef4b2b1, 0xc02e10bda0, 0x0, 0x0)
REG_Vestigia_T_2_8_03 |         /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/eth/downloader/downloader.go:457 +0x3df
REG_Vestigia_T_2_8_03 | github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).synchronise(0xc02b9f96c0, 0xc02cbc2610, 0x10, 0x289386617983b2d5, 0x5587869af5860718, 0x67514978236b5b82, 0x24cda1c07ef4b2b1, 0xc02e10bda0, 0x0, 0x0, ...)
REG_Vestigia_T_2_8_03 |         /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/eth/downloader/downloader.go:425 +0x418
REG_Vestigia_T_2_8_03 | github.com/ethereum/go-ethereum/eth/downloader.(*Downloader).Synchronise(0xc02b9f96c0, 0xc02cbc2610, 0x10, 0x289386617983b2d5, 0x5587869af5860718, 0x67514978236b5b82, 0x24cda1c07ef4b2b1, 0xc02e10bda0, 0x0, 0x6697d6, ...)
REG_Vestigia_T_2_8_03 |         /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/eth/downloader/downloader.go:329 +0x8e
REG_Vestigia_T_2_8_03 | github.com/ethereum/go-ethereum/eth.(*ProtocolManager).synchronise(0xc02ba645a0, 0xc000126780)
REG_Vestigia_T_2_8_03 |         /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/eth/sync.go:200 +0x2ee
REG_Vestigia_T_2_8_03 | created by github.com/ethereum/go-ethereum/eth.(*ProtocolManager).handleMsg

I'll try to reset the chain (removedb) and wait a few days for both to resynchronise.

alejandroalffer commented 2 years ago

Most node synchronization problems are related to the high memory and cpu needs of the process. The right way to sync a node from scratch, or troubleshoot when the LevelDB database becomes corrupted, is to have at least 8Gb of physical memory available, although 16Gb is recomended. In this case, it will be a cuestion of time before the synchronization is completed correctly, and a synchronization from scratch with the DLT would be very rare.

It is also recommended to sync with version 21.1.0 of GoQuorum, although it is recommended to move to version 21.10.2 once the process is finished. In this case, 4Gb of memory will be enough to keep the node stable.

jcarovestigia commented 2 years ago

Still the same with the two nodes:

REG_Vestigia_T_2_8_03   | INFO [10-02|11:39:48.728] Imported new chain segment               blocks=1    txs=1   mgas=0.525   elapsed=4.927ms   mgasps=106.589 number=102700192 hash=ea4f84…a4ea68 dirty=140.46MiB
REG_Vestigia_T_2_8_03   | INFO [10-02|11:39:50.357] QUORUM-CHECKPOINT                        name=TX-COMPLETED tx=0x72e8b2a79a9e8ee53dbfa77ceba5d8aee06710bb778ef851f1fd4dabf4416e31 time=953.33µs
REG_Vestigia_T_2_8_03   | fatal error: runtime: out of memory
REG_Vestigia_T_2_8_03   |
REG_Vestigia_T_2_8_03   | runtime stack:
REG_Vestigia_T_2_8_03   | runtime.throw(0x15c4607, 0x16)
REG_Vestigia_T_2_8_03   |       /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/panic.go:1116 +0x72
REG_Vestigia_T_2_8_03   | runtime.sysMap(0xc1e0000000, 0xc0000000, 0x292f7b8)
REG_Vestigia_T_2_8_03   |       /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/mem_linux.go:169 +0xc6
REG_Vestigia_T_2_8_03   | runtime.(*mheap).sysAlloc(0x29134a0, 0xbe000000, 0x441e37, 0x29134a8)
REG_Vestigia_T_2_8_03   |       /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/malloc.go:727 +0x1e5
REG_Vestigia_T_2_8_03   | runtime.(*mheap).grow(0x29134a0, 0x5eed5, 0x0)
REG_Vestigia_T_2_8_03   |       /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/mheap.go:1344 +0x85
REG_Vestigia_T_2_8_03   | runtime.(*mheap).allocSpan(0x29134a0, 0x5eed5, 0x521e2c7010100, 0x292f7c8, 0x200000002)
REG_Vestigia_T_2_8_03   |       /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/mheap.go:1160 +0x6b6
REG_Vestigia_T_2_8_03   | runtime.(*mheap).alloc.func1()
REG_Vestigia_T_2_8_03   |       /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/mheap.go:907 +0x65
REG_Vestigia_T_2_8_03   | runtime.(*mheap).alloc(0x29134a0, 0x5eed5, 0xc000500101, 0xc112533080)
REG_Vestigia_T_2_8_03   |       /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/mheap.go:901 +0x85
REG_Vestigia_T_2_8_03   | runtime.largeAlloc(0xbdda8400, 0x101, 0x0)
REG_Vestigia_T_2_8_03   |       /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/malloc.go:1177 +0x92
REG_Vestigia_T_2_8_03   | runtime.mallocgc.func1()
REG_Vestigia_T_2_8_03   |       /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/malloc.go:1071 +0x46
REG_Vestigia_T_2_8_03   | runtime.systemstack(0x7f4d38000020)
REG_Vestigia_T_2_8_03   |       /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/asm_amd64.s:370 +0x66
REG_Vestigia_T_2_8_03   | runtime.mstart()
REG_Vestigia_T_2_8_03   |       /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/proc.go:1116
REG_Vestigia_T_2_8_03   |
REG_Vestigia_T_2_8_03   | goroutine 65 [running]:
REG_Vestigia_T_2_8_03   | runtime.systemstack_switch()
REG_Vestigia_T_2_8_03   |       /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/asm_amd64.s:330 fp=0xc000da18c0 sp=0xc000da18b8 pc=0x484680
REG_Vestigia_T_2_8_03   | runtime.mallocgc(0xbdda8400, 0x1501360, 0xc026557e01, 0x521e2c720709d)
REG_Vestigia_T_2_8_03   |       /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/malloc.go:1070 +0x938 fp=0xc000da1960 sp=0xc000da18c0 pc=0x423bd8
REG_Vestigia_T_2_8_03   | runtime.makeslice(0x1501360, 0x0, 0x5eed420, 0x4cb547dd5c91e2ad)
REG_Vestigia_T_2_8_03   |       /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/slice.go:98 +0x6c fp=0xc000da1990 sp=0xc000da1960 pc=0x46510c
REG_Vestigia_T_2_8_03   | github.com/ethereum/go-ethereum/core/rawdb.(*freezer).freeze(0xc000120740, 0x186b300, 0xc00009a800)
REG_Vestigia_T_2_8_03   |       /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/core/rawdb/freezer.go:296 +0x4ce fp=0xc000da1fc8 sp=0xc000da1990 pc=0x91894e
REG_Vestigia_T_2_8_03   | runtime.goexit()
REG_Vestigia_T_2_8_03   |       /opt/hostedtoolcache/go/1.15.5/x64/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc000da1fd0 sp=0xc000da1fc8 pc=0x486461
REG_Vestigia_T_2_8_03   | created by github.com/ethereum/go-ethereum/core/rawdb.NewDatabaseWithFreezer
REG_Vestigia_T_2_8_03   |       /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/core/rawdb/database.go:176 +0x1d2
REG_Vestigia_T_2_8_03   |
REG_Vestigia_T_2_8_03   | goroutine 1 [chan receive, 4 minutes]:
REG_Vestigia_T_2_8_03   | github.com/ethereum/go-ethereum/node.(*Node).Wait(0xc00069cc80)
REG_Vestigia_T_2_8_03   |       /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/node/node.go:554 +0x7c
REG_Vestigia_T_2_8_03   | main.geth(0xc000170000, 0x0, 0x0)
REG_Vestigia_T_2_8_03   |       /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/cmd/geth/main.go:343 +0x135
REG_Vestigia_T_2_8_03   | github.com/ethereum/go-ethereum/vendor/gopkg.in/urfave/cli%2ev1.HandleAction(0x136c360, 0x16aed88, 0xc000170000, 0xc000327a40, 0x0)
REG_Vestigia_T_2_8_03   |       /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/vendor/gopkg.in/urfave/cli.v1/app.go:490 +0x82
REG_Vestigia_T_2_8_03   | github.com/ethereum/go-ethereum/vendor/gopkg.in/urfave/cli%2ev1.(*App).Run(0xc0005401a0, 0xc000132000, 0x2d, 0x30, 0x0, 0x0)
REG_Vestigia_T_2_8_03   |       /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/vendor/gopkg.in/urfave/cli.v1/app.go:264 +0x5f5
REG_Vestigia_T_2_8_03   | main.main()
REG_Vestigia_T_2_8_03   |       /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/cmd/geth/main.go:280 +0x55
REG_Vestigia_T_2_8_03   |
REG_Vestigia_T_2_8_03   | goroutine 19 [chan receive]:
REG_Vestigia_T_2_8_03   | github.com/ethereum/go-ethereum/metrics.(*meterArbiter).tick(0x28da860)
REG_Vestigia_T_2_8_03   |       /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/metrics/meter.go:289 +0x7d
REG_Vestigia_T_2_8_03   | created by github.com/ethereum/go-ethereum/metrics.NewMeter
REG_Vestigia_T_2_8_03   |       /home/runner/work/quorum/quorum/go/src/github.com/ethereum/go-ethereum/build/_workspace/src/github.com/ethereum/go-ethereum/metrics/meter.go:55 +0x11a
REG_Vestigia_T_2_8_03   |
etc...

I'll make a try with 16GB (nodes have 8GB) How should I change from GoQuorum 21.1.0 to 21.10.2. Is there a procedure?

EuskoinSmartEconomy commented 2 years ago

To make the change of GoQuorum version you must continue this steps: https://github.com/alastria/alastria-node-quorum/tree/upgrade-branch/GoQuorum-21.10.2#upgrading-node-to-goquorum-version-21102

alejandroalffer commented 2 years ago

If you cannot recover the LevelDB (... and the truth is that the errors you have shows that it will be hard) you will have to perform a new DLT synchronization.

You can use these commands, and please, use your own directory structure:

# make a backup of your private key
# make a backup of your private key
$ ./cp nodekey /root/nodekey_backup
# make a backup of your private key
# make a backup of your private key

$ ./geth --datadir _dir_ removedb 
$ ./geth --datadir _dir_ init ./genesis.json
$ cp /root/nodekey_backup keystore/nodekey
# start geth, or restart the container...

And start over the sync process.

PD. Another way: seems that you have other running nodes! You can use a copy of the datadir directory (excep nodekey file) and use it.

You need to keep in mind:

stop the source node, in order to make a cold copy
use any kind of delta copy... like rsync
be safe using the right paths

PD. No... geth import and geth export seems not to work, at least in my nodes

rbarriuso commented 2 years ago

For the record we're having a similar issue here with REG_Tribalyte_T_6_5_00, which has 5GB RAM + 9GB swap. As described in this issue, geth crashes from time to time due to out of memory errors when doing a full sync.

Also sometimes it doesn't crash but the log is full of messages like:

VM returned with error                   err="evm: execution reverted"

and geth attach alastria/data/geth.ipc fails with

Fatal: Failed to start the JavaScript console: api modules: context deadline exceeded

jcarovestigia commented 2 years ago

I can confirm the solution is:

Install the node as described in https://github.com/alastria/alastria-node-quorum#1-installation
Synchronise it
Upgrade goQuorum following https://github.com/alastria/alastria-node-quorum/tree/upgrade-branch/GoQuorum-21.10.2#upgrading-node-to-goquorum-version-21102

Thank-you all for your help.

After upgrade to goQuorum 21.10.2, the node stops trying to mine.

The only issue that remains is that the node isn't list in Grafana.

rbarriuso commented 2 years ago

Indeed, we upgraded to 21.10.2 after synchronizing and now the node seems to be stable (2 days up with no issues) and the memory consumption is much smaller

EuskoinSmartEconomy commented 2 years ago

I can confirm the solution is:
1. Install the node as described in [https://github.com/alastria/alastria-node-quorum#1-installation](url)

2. Synchronise it

3. Upgrade goQuorum following [https://github.com/alastria/alastria-node-quorum/tree/upgrade-branch/GoQuorum-21.10.2#upgrading-node-to-goquorum-version-21102](url)
Thank-you all for your help.

After upgrade to goQuorum 21.10.2, the node stops trying to mine.

The only issue that remains is that the node isn't list in Grafana.

To appear in Grafana, be sure that the port 6060 is in Dockerfile yaml and it is forwarded in the router for docker hosting machine

alastria / alastria-node-quorum

Nodos en un ciclo de saturación de memoria y resincronización #36