alastria / alastria-node-quorum

How to install a node in Alastria Red-T (GoQuorum Technology) and tips to deploy and use it
Apache License 2.0
4 stars 3 forks source link

Blocks generation stopped (11/12/2022) #38

Closed irzinfante closed 1 year ago

irzinfante commented 1 year ago

Network has stopped generating blocks since 11/12/2022

The next steps are to create a private network between the validator nodes, make the network generate new blocks and then connect the rest of the nodes to the working network.

alejandroalffer commented 1 year ago

All consensus round validator nodes must:

To do it:

Using 16Gb of memory is highly recommended to facilitate the synchronization process to finish correctly

irzinfante commented 1 year ago

GoQuorum v21.10.2 preferred

alejandroalffer commented 1 year ago

GoQuorum v21.10.2 preferred

Updated previus comment

alejandroalffer commented 1 year ago

My nodes are not advancing in the chain to reach 106983272 block:

Block freezes are very close to the 700M hard limit reported in the issue https://github.com/ConsenSys/quorum/issues/1434

Using a debug of 4, the messages are similar. The node (shown with the first characters of the admin.node.id) seems sending/receiving blocks from its peers:

BOT_DigitelTS_T_4_16_00

DEBUG[12-14|13:54:49.780] IsNodePermissioned                       connection=OUTGOING nodename=3a65981e1df04a40f9a283f24a2d3640 DENIED-BY=a2054ebfafb0f0f5d5aba7068e1f1482
DEBUG[12-14|13:54:49.851] parsePermissionedNodes                   DataDir=/root/alastria/data file=permissioned-nodes.json
DEBUG[12-14|13:54:49.851] IsNodePermissioned                       permissionedList="[intentionally supressed]"
DEBUG[12-14|13:54:49.851] IsNodePermissioned                       connection=INCOMING nodename=f6d4269127e231447c072684e7666b3e DENIED-BY=a2054ebfafb0f0f5d5aba7068e1f1482
DEBUG[12-14|13:54:49.879] parsePermissionedNodes                   DataDir=/root/alastria/data file=permissioned-nodes.json
DEBUG[12-14|13:54:49.880] IsNodePermissioned                       permissionedList="[intentionally supressed]"
DEBUG[12-14|13:54:49.880] IsNodePermissioned                       connection=INCOMING nodename=93f016c90a9410b749a17a07af8b14d6 DENIED-BY=a2054ebfafb0f0f5d5aba7068e1f1482

Looking for a workaround for this problem, using geth debug commands


PD. They both have several peers in different versions. Not FW/Network problem.

    name: "Geth/BOT_Izertis_Telsius_2_4_02/v1.8.18-stable(quorum-v2.2.3-0.Alastria_EthNetstats_IBFT)/linux-amd64/go1.9.5",
    name: "Geth/BOT_Planisys_Telsius_8_16_01/v1.9.7-stable-a21e1d44(quorum-v21.1.0)/linux-amd64/go1.15.5",
    name: "Geth/BOT_SERES_T_2_8_01/v1.8.12-stable/linux-amd64/go1.9.5",
    name: "Geth/VAL_Alisys_Telsius_2_8_01/v1.8.18-stable(quorum-v2.2.3-0.Alastria_EthNetstats_IBFT)/linux-amd64/go1.9.5",
    name: "Geth/VAL_COUNCILBOX_Telsius_2_4_00/v1.9.25-stable-919800f0(quorum-v21.10.0)/linux-amd64/go1.15.6",
    name: "Geth/VAL_DigitelTS_T_4_16_00/v1.9.25-stable-919800f0(quorum-v21.10.0)/linux-amd64/go1.15.6",
    name: "Geth/VAL_IN2_PC_Telsius_2_8_00/v1.9.25-stable-1b7aa254(quorum-v21.10.2041)/linux-amd64/go1.19.3",
    name: "Geth/VAL_INDRA_T_2_4_00/v1.9.25-stable-cd11c38e(quorum-v21.10.2)/linux-amd64/go1.15.6",
    name: "Geth/VAL_Izertis_Telsius_2_4_00/v1.8.18-stable(quorum-v2.2.3-0.Alastria_EthNetstats_IBFT)/linux-amd64/go1.9.5",
    name: "Geth/VAL_SERES_T_2_8_01/v1.8.12-stable/linux-amd64/go1.9.5",
    name: "Geth/BOT_DigitelTS_T_4_16_00/v1.9.25-stable-cd11c38e(quorum-v21.10.2)/linux-amd64/go1.15.6",
    name: "Geth/BOT_Izertis_Telsius_2_4_02/v1.8.18-stable(quorum-v2.2.3-0.Alastria_EthNetstats_IBFT)/linux-amd64/go1.9.5",
    name: "Geth/BOT_Planisys_Telsius_8_16_01/v1.9.7-stable-a21e1d44(quorum-v21.1.0)/linux-amd64/go1.15.5",
    name: "Geth/BOT_SERES_T_2_8_01/v1.8.12-stable/linux-amd64/go1.9.5",
    name: "Geth/VAL_Alisys_Telsius_2_8_01/v1.8.18-stable(quorum-v2.2.3-0.Alastria_EthNetstats_IBFT)/linux-amd64/go1.9.5",
    name: "Geth/VAL_IN2_PC_Telsius_2_8_00/v1.9.25-stable-1b7aa254(quorum-v21.10.2041)/linux-amd64/go1.19.3",
    name: "Geth/VAL_INDRA_T_2_4_00/v1.9.25-stable-cd11c38e(quorum-v21.10.2)/linux-amd64/go1.15.6",
    name: "Geth/VAL_Izertis_Telsius_2_4_00/v1.8.18-stable(quorum-v2.2.3-0.Alastria_EthNetstats_IBFT)/linux-amd64/go1.9.5",
    name: "Geth/VAL_SERES_T_2_8_01/v1.8.12-stable/linux-amd64/go1.9.5",

PD2. They both, in fact, already has the complete DLT; until block 106983272:

> admin.nodeInfo.name
"Geth/VAL_DigitelTS_T_4_16_00/v1.9.25-stable-919800f0(quorum-v21.10.0)/linux-amd64/go1.15.6"
> eth.blockNumber
106919851
> eth.getBlockByNumber(106983272)
{
  difficulty: "0x1",
[...]
  gasLimit: "0x29b3f008",
  gasUsed: "0x1378d94",
[...]
}
> eth.getBlockByNumber(106983273)
null
> admin.nodeInfo.name
"Geth/BOT_DigitelTS_T_4_16_00/v1.9.25-stable-cd11c38e(quorum-v21.10.2)/linux-amd64/go1.15.6"
> eth.blockNumber
106971166
> eth.getBlockByNumber(106983272)
{
  difficulty: "0x1",
[...]
  gasLimit: "0x29b3f008",
  gasUsed: "0x1378d94",
[...]
}
> eth.getBlockByNumber(106983273)
null
alejandroalffer commented 1 year ago

Well...

My nodes are not advancing in the chain to reach 106983272 block:

... going back one block on the DLT, the synchronization has restarted ¿?

> eth.blockNumber
106919851
> debug.setHead("0x65F77AA") 
null

$ docker restart

> eth.syncing
{
  currentBlock: 106934592,
  highestBlock: 106983272,
  knownStates: 12690992,
  pulledStates: 12690992,
  startingBlock: 106919850
}

We will evaluate this solution when all the validators are in block 106983272 and we have unified the version to 21.10.2

PD. The DENIED-BY errors still remains

alejandroalffer commented 1 year ago

VERY IMPORTANT: It is highly recommended to run:

> debug.chaindbCompact()

Before running the debug.setHead() command

hesusruiz commented 1 year ago

I built a special version of Geth (Quorum 21.10.2) with logs to try to diagnose the problem with the Validators. I created a special repository: alaquorum.

The instructions for building the Geth executable are in the readme of the repository.

But the repo also contains an already build binary. For those using the official Dockerfile from the Alastria Node repository (this repo), they can easily switch to the patched Geth version by modifying the Dockerfile by replacing these lines:

RUN wget -O geth_${VER}_linux_amd64.tar.gz https://artifacts.consensys.net/public/go-quorum/raw/versions/${VER}/geth_${VER}_linux_amd64.tar.gz
RUN tar zxvf geth_${VER}_linux_amd64.tar.gz -C /usr/local/bin

with these:

RUN wget -O /usr/local/bin/geth https://raw.githubusercontent.com/hesusruiz/alaquorum/main/build/bin/geth
RUN chmod +x /usr/local/bin/geth
alejandroalffer commented 1 year ago

Opened issue with Consensus, in order to control the block number: most of the validator nodes cannot reach the desired one: https://github.com/ConsenSys/quorum/issues/1585

alejandroalffer commented 1 year ago

Added fix in https://github.com/alastria/alastria-node-quorum/pull/39 in order to start synchronization process

alejandroalffer commented 1 year ago

The network is operational again, thanks to the corrective actions in:

Still pending to unify blocks and (patched) versions of @hesusruiz but transactions are being mined