Closed irzinfante closed 1 year ago
All consensus round validator nodes must:
106983272
v1.9.25-stable-919800f0 (quorum-v21.10.2)
To do it:
permissioned-nodes.json
and static-nodes.json
for each caseUsing 16Gb of memory is highly recommended to facilitate the synchronization process to finish correctly
GoQuorum v21.10.2 preferred
GoQuorum v21.10.2 preferred
Updated previus comment
My nodes are not advancing in the chain to reach 106983272
block:
VAL_DigitelTS_T_4_16_00 (member of the IBTF consensus):
> admin.nodeInfo.name
"Geth/VAL_DigitelTS_T_4_16_00/v1.9.25-stable-919800f0(quorum-v21.10.0)/linux-amd64/go1.15.6"
> admin.nodeInfo.id
"ae385305ccad4d035e92efbfebc69e585b93f293e6552a0110afc196d90105dd"
> eth.blockNumber
106919851
> eth.syncing
false
> eth.getBlockByNumber(106919851)
{
difficulty: "0x1",
extraData: "intentionally supressed",
gasLimit: "0x29b154ca",
gasUsed: "0x0",
[...]
BOT_DigitelTS_T_4_16_00
> admin.nodeInfo.name
"Geth/BOT_DigitelTS_T_4_16_00/v1.9.25-stable-cd11c38e(quorum-v21.10.2)/linux-amd64/go1.15.6"
> admin.nodeInfo.id
"a2054ebfafb0f0f5d5aba7068e1f14829e69dace06ad91a4ce23012984de1f06"
> eth.blockNumber
106971166
> eth.syncing
false
> eth.getBlockByNumber(106971166)
{
difficulty: "0x1",
extraData: "intentionally supressed",
gasLimit: "0x29b154ca",
[...]
Block freezes are very close to the 700M hard limit reported in the issue https://github.com/ConsenSys/quorum/issues/1434
Using a debug of 4
, the messages are similar. The node (shown with the first characters of the admin.node.id) seems sending/receiving blocks from its peers:
DEBUG[12-14|13:56:49.256] IsNodePermissioned connection=OUTGOING nodename=91b2db08a56a46423b296b078775b8a4 DENIED-BY=a2054ebfafb0f0f5d5aba7068e1f1482
DEBUG[12-14|13:56:49.503] parsePermissionedNodes DataDir=/root/alastria/data file=permissioned-nodes.json
DEBUG[12-14|13:56:49.503] IsNodePermissioned permissionedList="[intentionally supressed]"
DEBUG[12-14|13:56:49.503] IsNodePermissioned connection=INCOMING nodename=36b48f90c5fa46465ade5c615c74c7d4 DENIED-BY=a2054ebfafb0f0f5d5aba7068e1f1482
DEBUG[12-14|13:56:49.546] parsePermissionedNodes DataDir=/root/alastria/data file=permissioned-nodes.json
DEBUG[12-14|13:56:49.547] IsNodePermissioned permissionedList="[intentionally supressed]"
DEBUG[12-14|13:56:49.547] IsNodePermissioned connection=INCOMING nodename=78ee4846f569cc208bf3478556a9bf8b DENIED-BY=a2054ebfafb0f0f5d5aba7068e1f1482
DEBUG[12-14|13:56:49.809] parsePermissionedNodes DataDir=/root/alastria/data file=permissioned-nodes.json
BOT_DigitelTS_T_4_16_00
DEBUG[12-14|13:54:49.780] IsNodePermissioned connection=OUTGOING nodename=3a65981e1df04a40f9a283f24a2d3640 DENIED-BY=a2054ebfafb0f0f5d5aba7068e1f1482
DEBUG[12-14|13:54:49.851] parsePermissionedNodes DataDir=/root/alastria/data file=permissioned-nodes.json
DEBUG[12-14|13:54:49.851] IsNodePermissioned permissionedList="[intentionally supressed]"
DEBUG[12-14|13:54:49.851] IsNodePermissioned connection=INCOMING nodename=f6d4269127e231447c072684e7666b3e DENIED-BY=a2054ebfafb0f0f5d5aba7068e1f1482
DEBUG[12-14|13:54:49.879] parsePermissionedNodes DataDir=/root/alastria/data file=permissioned-nodes.json
DEBUG[12-14|13:54:49.880] IsNodePermissioned permissionedList="[intentionally supressed]"
DEBUG[12-14|13:54:49.880] IsNodePermissioned connection=INCOMING nodename=93f016c90a9410b749a17a07af8b14d6 DENIED-BY=a2054ebfafb0f0f5d5aba7068e1f1482
Looking for a workaround for this problem, using geth
debug
commands
PD. They both have several peers in different versions. Not FW/Network problem.
name: "Geth/BOT_Izertis_Telsius_2_4_02/v1.8.18-stable(quorum-v2.2.3-0.Alastria_EthNetstats_IBFT)/linux-amd64/go1.9.5",
name: "Geth/BOT_Planisys_Telsius_8_16_01/v1.9.7-stable-a21e1d44(quorum-v21.1.0)/linux-amd64/go1.15.5",
name: "Geth/BOT_SERES_T_2_8_01/v1.8.12-stable/linux-amd64/go1.9.5",
name: "Geth/VAL_Alisys_Telsius_2_8_01/v1.8.18-stable(quorum-v2.2.3-0.Alastria_EthNetstats_IBFT)/linux-amd64/go1.9.5",
name: "Geth/VAL_COUNCILBOX_Telsius_2_4_00/v1.9.25-stable-919800f0(quorum-v21.10.0)/linux-amd64/go1.15.6",
name: "Geth/VAL_DigitelTS_T_4_16_00/v1.9.25-stable-919800f0(quorum-v21.10.0)/linux-amd64/go1.15.6",
name: "Geth/VAL_IN2_PC_Telsius_2_8_00/v1.9.25-stable-1b7aa254(quorum-v21.10.2041)/linux-amd64/go1.19.3",
name: "Geth/VAL_INDRA_T_2_4_00/v1.9.25-stable-cd11c38e(quorum-v21.10.2)/linux-amd64/go1.15.6",
name: "Geth/VAL_Izertis_Telsius_2_4_00/v1.8.18-stable(quorum-v2.2.3-0.Alastria_EthNetstats_IBFT)/linux-amd64/go1.9.5",
name: "Geth/VAL_SERES_T_2_8_01/v1.8.12-stable/linux-amd64/go1.9.5",
name: "Geth/BOT_DigitelTS_T_4_16_00/v1.9.25-stable-cd11c38e(quorum-v21.10.2)/linux-amd64/go1.15.6",
name: "Geth/BOT_Izertis_Telsius_2_4_02/v1.8.18-stable(quorum-v2.2.3-0.Alastria_EthNetstats_IBFT)/linux-amd64/go1.9.5",
name: "Geth/BOT_Planisys_Telsius_8_16_01/v1.9.7-stable-a21e1d44(quorum-v21.1.0)/linux-amd64/go1.15.5",
name: "Geth/BOT_SERES_T_2_8_01/v1.8.12-stable/linux-amd64/go1.9.5",
name: "Geth/VAL_Alisys_Telsius_2_8_01/v1.8.18-stable(quorum-v2.2.3-0.Alastria_EthNetstats_IBFT)/linux-amd64/go1.9.5",
name: "Geth/VAL_IN2_PC_Telsius_2_8_00/v1.9.25-stable-1b7aa254(quorum-v21.10.2041)/linux-amd64/go1.19.3",
name: "Geth/VAL_INDRA_T_2_4_00/v1.9.25-stable-cd11c38e(quorum-v21.10.2)/linux-amd64/go1.15.6",
name: "Geth/VAL_Izertis_Telsius_2_4_00/v1.8.18-stable(quorum-v2.2.3-0.Alastria_EthNetstats_IBFT)/linux-amd64/go1.9.5",
name: "Geth/VAL_SERES_T_2_8_01/v1.8.12-stable/linux-amd64/go1.9.5",
PD2. They both, in fact, already has the complete DLT; until block 106983272
:
> admin.nodeInfo.name
"Geth/VAL_DigitelTS_T_4_16_00/v1.9.25-stable-919800f0(quorum-v21.10.0)/linux-amd64/go1.15.6"
> eth.blockNumber
106919851
> eth.getBlockByNumber(106983272)
{
difficulty: "0x1",
[...]
gasLimit: "0x29b3f008",
gasUsed: "0x1378d94",
[...]
}
> eth.getBlockByNumber(106983273)
null
> admin.nodeInfo.name
"Geth/BOT_DigitelTS_T_4_16_00/v1.9.25-stable-cd11c38e(quorum-v21.10.2)/linux-amd64/go1.15.6"
> eth.blockNumber
106971166
> eth.getBlockByNumber(106983272)
{
difficulty: "0x1",
[...]
gasLimit: "0x29b3f008",
gasUsed: "0x1378d94",
[...]
}
> eth.getBlockByNumber(106983273)
null
Well...
My nodes are not advancing in the chain to reach
106983272
block:
... going back one block on the DLT, the synchronization has restarted ¿?
> eth.blockNumber
106919851
> debug.setHead("0x65F77AA")
null
$ docker restart
> eth.syncing
{
currentBlock: 106934592,
highestBlock: 106983272,
knownStates: 12690992,
pulledStates: 12690992,
startingBlock: 106919850
}
We will evaluate this solution when all the validators are in block 106983272 and we have unified the version to 21.10.2
PD. The DENIED-BY
errors still remains
VERY IMPORTANT: It is highly recommended to run:
> debug.chaindbCompact()
Before running the debug.setHead() command
I built a special version of Geth (Quorum 21.10.2) with logs to try to diagnose the problem with the Validators. I created a special repository: alaquorum.
The instructions for building the Geth executable are in the readme of the repository.
But the repo also contains an already build binary. For those using the official Dockerfile from the Alastria Node repository (this repo), they can easily switch to the patched Geth version by modifying the Dockerfile by replacing these lines:
RUN wget -O geth_${VER}_linux_amd64.tar.gz https://artifacts.consensys.net/public/go-quorum/raw/versions/${VER}/geth_${VER}_linux_amd64.tar.gz
RUN tar zxvf geth_${VER}_linux_amd64.tar.gz -C /usr/local/bin
with these:
RUN wget -O /usr/local/bin/geth https://raw.githubusercontent.com/hesusruiz/alaquorum/main/build/bin/geth
RUN chmod +x /usr/local/bin/geth
Opened issue with Consensus, in order to control the block number: most of the validator nodes cannot reach the desired one: https://github.com/ConsenSys/quorum/issues/1585
Added fix in https://github.com/alastria/alastria-node-quorum/pull/39 in order to start synchronization process
The network is operational again, thanks to the corrective actions in:
Still pending to unify blocks and (patched) versions of @hesusruiz but transactions are being mined
Network has stopped generating blocks since 11/12/2022
The next steps are to create a private network between the validator nodes, make the network generate new blocks and then connect the rest of the nodes to the working network.