MerosCrypto / Meros

An instant and feeless cryptocurrency for the future, secured by the Merit Caching Consensus Mechanism.
https://meroscrypto.io
Other
82 stars 19 forks source link

"Syncing a Block threw an error despite catching all exceptions: value out of range: -5 notin 0 .. 9223372036854775807" #263

Closed Vyryn closed 3 years ago

Vyryn commented 3 years ago

This PANIC error crashed my node after about seven hours of stable running, on hash 65AAEF7473C95F9E70EA64793C5BEC67A1992F788D4025B73FAF8C86C34D0100

Log:

FAT 2020-12-06 00:30:47.832+00:00 Panic                                      tid=2329 reason="Trying to handle a socket which isn't a socket: (107) Transport endpoint is not connected"
DBG 2020-12-06 00:30:47.832+00:00 Closing raw socket                         tid=2329 reason="Either couldn't connect or connected to ourself."
DBG 2020-12-06 00:30:47.832+00:00 Connecting                                 tid=2329 address=35.239.25.97 port=5132
DBG 2020-12-06 00:30:47.832+00:00 Connecting                                 tid=2329 address=172.249.109.69 port=5132
TRC 2020-12-06 00:30:57.327+00:00 New Exception                              tid=2329 msg="Block was already added."
INF 2020-12-06 00:30:57.924+00:00 New Block                                  tid=2329 hash=65AAEF7473C95F9E70EA64793C5BEC67A1992F788D4025B73FAF8C86C34D0100 currentHeight=65
FAT 2020-12-06 00:30:57.925+00:00 Panic                                      tid=2329 reason="Syncing a Block threw an error despite catching all exceptions: value out of range: -5 notin 0 .. 9223372036854775807"
INF 2020-12-06 00:30:57.926+00:00 Invalid Block                              tid=2329 hash=65AAEF7473C95F9E70EA64793C5BEC67A1992F788D4025B73FAF8C86C34D0100 reason="/home/vyryn/Meros/src/MainMerit.nim(192, 12) `false` Syncing a Block threw an error despite catching all exceptions: value out of range: -5 notin 0 .. 9223372036854775807"
FAT 2020-12-06 00:30:57.926+00:00 Panic                                      tid=2329 reason="addBlock threw an Exception despite catching all Exceptions: /home/vyryn/Meros/src/MainMerit.nim(192, 12) `false` Syncing a Block threw an error despite catching all exceptions: value out of range: -5 notin 0 .. 9223372036854775807"
FAT 2020-12-06 00:30:57.926+00:00 Panic                                      tid=2329 reason="addBlock threw a raw Exception, despite catching all Exception types it naturally raises: /home/vyryn/Meros/src/MainMerit.nim(413, 12) `false` addBlock threw an Exception despite catching all Exceptions: /home/vyryn/Meros/src/MainMerit.nim(192, 12) `false` Syncing a Block threw an error despite catching all exceptions: value out of range: -5 notin 0 .. 9223372036854775807"
FAT 2020-12-06 00:30:57.926+00:00 Panic                                      tid=2329 reason="RPC caused a panic: /home/vyryn/Meros/src/Interfaces/RPC/Modules/MeritModule.nim(368, 16) `false` addBlock threw a raw Exception, despite catching all Exception types it naturally raises: /home/vyryn/Meros/src/MainMerit.nim(413, 12) `false` addBlock threw an Exception despite catching all Exceptions: /home/vyryn/Meros/src/MainMerit.nim(192, 12) `false` Syncing a Block threw an error despite catching all exceptions: value out of range: -5 notin 0 .. 9223372036854775807"
FAT 2020-12-06 00:30:57.926+00:00 Panic                                      tid=2329 reason="Couldn't handle the request (JSON; return res), despite catching all naturally thrown Exceptions: /home/vyryn/Meros/src/Interfaces/RPC/RPC.nim(147, 10) `false` RPC caused a panic: /home/vyryn/Meros/src/Interfaces/RPC/Modules/MeritModule.nim(368, 16) `false` addBlock threw a raw Exception, despite catching all Exception types it naturally raises: /home/vyryn/Meros/src/MainMerit.nim(413, 12) `false` addBlock threw an Exception despite catching all Exceptions: /home/vyryn/Meros/src/MainMerit.nim(192, 12) `false` Syncing a Block threw an error despite catching all exceptions: value out of range: -5 notin 0 .. 9223372036854775807"
FAT 2020-12-06 00:30:57.926+00:00 Panic                                      tid=2329 reason="Couldn't handle the request (string), despite catching all naturally thrown Exceptions: /home/vyryn/Meros/src/Interfaces/RPC/RPC.nim(192, 12) `false` Couldn't handle the request (JSON; return res), despite catching all naturally thrown Exceptions: /home/vyryn/Meros/src/Interfaces/RPC/RPC.nim(147, 10) `false` RPC caused a panic: /home/vyryn/Meros/src/Interfaces/RPC/Modules/MeritModule.nim(368, 16) `false` addBlock threw a raw Exception, despite catching all Exception types it naturally raises: /home/vyryn/Meros/src/MainMerit.nim(413, 12) `false` addBlock threw an Exception despite catching all Exceptions: /home/vyryn/Meros/src/MainMerit.nim(192, 12) `false` Syncing a Block threw an error despite catching all exceptions: value out of range: -5 notin 0 .. 9223372036854775807"
FAT 2020-12-06 00:30:57.926+00:00 Panic                                      tid=2329 reason="RPC's handle threw an Exception despite not naturally throwing anything: /home/vyryn/Meros/src/Interfaces/RPC/RPC.nim(283, 10) `false` Couldn't handle the request (string), despite catching all naturally thrown Exceptions: /home/vyryn/Meros/src/Interfaces/RPC/RPC.nim(192, 12) `false` Couldn't handle the request (JSON; return res), despite catching all naturally thrown Exceptions: /home/vyryn/Meros/src/Interfaces/RPC/RPC.nim(147, 10) `false` RPC caused a panic: /home/vyryn/Meros/src/Interfaces/RPC/Modules/MeritModule.nim(368, 16) `false` addBlock threw a raw Exception, despite catching all Exception types it naturally raises: /home/vyryn/Meros/src/MainMerit.nim(413, 12) `false` addBlock threw an Exception despite catching all Exceptions: /home/vyryn/Meros/src/MainMerit.nim(192, 12) `false` Syncing a Block threw an error despite catching all exceptions: value out of range: -5 notin 0 .. 9223372036854775807"
/home/vyryn/Meros/src/Meros.nim(121) Meros
/home/vyryn/Meros/src/Meros.nim(112) main
/home/vyryn/.nimble/pkgs/chronos-2.5.2/chronos/asyncloop.nim(981) runForever
/home/vyryn/.nimble/pkgs/chronos-2.5.2/chronos/asyncloop.nim(276) poll
/home/vyryn/Meros/src/Interfaces/RPC/RPC.nim(407) anonymous
/home/vyryn/.choosenim/toolchains/nim-1.2.4/lib/system/assertions.nim(29) failedAssertImpl
/home/vyryn/.choosenim/toolchains/nim-1.2.4/lib/system/assertions.nim(22) raiseAssert/home/vyryn/.choosenim/toolchains/nim-1.2.4/lib/system/fatal.nim(49) sysFatal
[[reraised from:
/home/vyryn/Meros/src/Meros.nim(121) Meros
/home/vyryn/Meros/src/Meros.nim(112) main
/home/vyryn/.nimble/pkgs/chronos-2.5.2/chronos/asyncloop.nim(981) runForever
/home/vyryn/.nimble/pkgs/chronos-2.5.2/chronos/asyncloop.nim(276) poll
/home/vyryn/.choosenim/toolchains/nim-1.2.4/lib/system/excpt.nim(126) anonymous
]]
Error: unhandled exception: /home/vyryn/Meros/src/Interfaces/RPC/RPC.nim(407, 14) `false` RPC's handle threw an Exception despite not naturally throwing anything: /home/vyryn/Meros/src/Interfaces/RPC/RPC.nim(283, 10) `false` Couldn't handle the request (string), despite catching all naturally thrown Exceptions: /home/vyryn/Meros/src/Interfaces/RPC/RPC.nim(192, 12) `false` Couldn't handle the request (JSON; return res), despite catching all naturally thrown Exceptions: /home/vyryn/Meros/src/Interfaces/RPC/RPC.nim(147, 10) `false` RPC caused a panic: /home/vyryn/Meros/src/Interfaces/RPC/Modules/MeritModule.nim(368, 16) `false` addBlock threw a raw Exception, despite catching all Exception types it naturally raises: /home/vyryn/Meros/src/MainMerit.nim(413, 12) `false` addBlock threw an Exception despite catching all Exceptions: /home/vyryn/Meros/src/MainMerit.nim(192, 12) `false` Syncing a Block threw an error despite catching all exceptions: value out of range: -5 notin 0 .. 9223372036854775807 [AssertionError]
kayabaNerve commented 3 years ago

Appears to be a problem with resolving the sketch, as the lack of logging in between "New Block" and "Panic" says it failed at the start of syncing it, AKA sketch resolution. This is likely because the RPC uses a custom Sketch in order to publish Blocks. Unfortunately, the logs don't contain enough info. I plan to add a log statement in the place, noting this issue, so we can hopefully get it next time.

That said, this is extremely problematic. Marking critical. Thank you for noting this. I'll update this when I add the log statement and ask participants to update their node.

kayabaNerve commented 3 years ago

Happened on a seed node which wasn't mining at all.

kayabaNerve commented 3 years ago

At some point in the sync function, we're creating a seq with a negative size. This was found due to @PlasmaPower. Thanks for that :)

Which seq is still up in the air; I'll look into it later today.

kayabaNerve commented 3 years ago

Apparently I forgot to update this.

The issue isn't that we're creating a seq with a negative size. The issue is we're deleting a negative index from a seq. That happens in this block.

  #Iterate over the differences.
  var
    m: int = 0
    offset: int = 0
  while m < result.missing.len:
    #If we have one of the differences, remove it from both packets and missing.
    if sketch.hashes.hasKey(result.missing[m]):
      try:
        result.packets.delete(sketch.hashes[result.missing[m]] - offset)
      except KeyError as e:
        panic("Couldn't get the index a hash maps to despite checking with hasKey first: " & e.msg)
      result.missing.delete(m)
      inc(offset)
      continue
    inc(m)

This assumes missing is ordered the same as we entered packets; it isn't. Moving to a O(n) approach should fix this. That said, a test to replicate this will be needed first to absolutely confirm both the problem and the fix.

kayabaNerve commented 3 years ago

Repro'd with:

from typing import List

import ed25519

from e2e.Libs.BLS import PrivateKey

from e2e.Classes.Transactions.Data import Data
from e2e.Classes.Consensus.Verification import SignedVerification
from e2e.Classes.Consensus.VerificationPacket import VerificationPacket
from e2e.Classes.Consensus.SpamFilter import SpamFilter
from e2e.Classes.Merit.Merit import Merit, Block

from e2e.Meros.RPC import RPC

from e2e.Vectors.Generation.PrototypeChain import PrototypeBlock

def TwoHundredSixtyThreeTest(
  rpc: RPC
) -> None:
  privKey: ed25519.SigningKey = ed25519.SigningKey(b'\0' * 32)
  pubKey: ed25519.VerifyingKey = privKey.get_verifying_key()

  spamFilter: SpamFilter = SpamFilter(5)

  merit: Merit = Merit()

  rpc.meros.liveConnect(merit.blockchain.blocks[0].header.hash)
  rpc.meros.syncConnect(merit.blockchain.blocks[0].header.hash)

  block: Block = PrototypeBlock(1200, [], minerID=PrivateKey(0)).finish(0, merit)
  merit.add(block)

  rpc.meros.liveBlockHeader(block.header)
  rpc.meros.sync.recv()
  rpc.meros.blockBody(block)
  rpc.meros.live.recv()

  data: Data = Data(bytes(32), pubKey.to_bytes())
  for _ in range(10):
    data.sign(privKey)
    data.beat(spamFilter)
    rpc.meros.liveTransaction(data)
    rpc.meros.live.recv()

    verif: SignedVerification = SignedVerification(data.hash)
    verif.sign(0, PrivateKey(0))
    rpc.meros.signedElement(verif)
    rpc.meros.live.recv()

    data = Data(data.hash, bytes(1))

  privKey = ed25519.SigningKey(b'\1' * 32)
  pubKey = privKey.get_verifying_key()

  data = Data(bytes(32), pubKey.to_bytes())
  datas: List[Data] = []
  for i in range(5):
    data.sign(privKey)
    data.beat(spamFilter)
    datas.append(data)
    rpc.meros.liveTransaction(data)
    rpc.meros.live.recv()

    verif: SignedVerification = SignedVerification(data.hash)
    verif.sign(0, PrivateKey(0))
    rpc.meros.signedElement(verif)
    rpc.meros.live.recv()

    data = Data(data.hash, bytes(1))

  block = PrototypeBlock(2400, [VerificationPacket(data.hash, [0]) for data in datas]).finish(0, merit)
  merit.add(block)

  rpc.meros.liveBlockHeader(block.header)
  rpc.meros.sync.recv()
  rpc.meros.blockBody(block, 15)
  rpc.meros.live.recv()

It uses a bogus capacity (it's technically valid; just larger than the data set) to ensure this is triggered. Since this is based on non-sorted data expected to be sorted, and Minisketch's internal ordering of the result, I didn't write a guaranteed-to-trigger MRE. Rather, I wrote the above, and tweaked numbers until I successfully hit this bug.

kayabaNerve commented 3 years ago

Triggered without unusual Sketch manipulation (to prove this could happen on a testnet, as it did):

from typing import List
from time import sleep

import ed25519

from e2e.Libs.BLS import PrivateKey
from e2e.Libs.Minisketch import Sketch

from e2e.Classes.Transactions.Data import Data
from e2e.Classes.Consensus.Verification import SignedVerification
from e2e.Classes.Consensus.VerificationPacket import VerificationPacket
from e2e.Classes.Consensus.SpamFilter import SpamFilter
from e2e.Classes.Merit.Merit import Merit, Block

from e2e.Meros.RPC import RPC

from e2e.Vectors.Generation.PrototypeChain import PrototypeBlock

def TwoHundredSixtyThreeTest(
  rpc: RPC
) -> None:
  privKey: ed25519.SigningKey = ed25519.SigningKey(b'\0' * 32)
  pubKey: ed25519.VerifyingKey = privKey.get_verifying_key()

  spamFilter: SpamFilter = SpamFilter(5)

  merit: Merit = Merit()

  rpc.meros.liveConnect(merit.blockchain.blocks[0].header.hash)
  rpc.meros.syncConnect(merit.blockchain.blocks[0].header.hash)

  block: Block = PrototypeBlock(1200, [], minerID=PrivateKey(0)).finish(0, merit)
  merit.add(block)

  rpc.meros.liveBlockHeader(block.header)
  rpc.meros.sync.recv()
  rpc.meros.blockBody(block)
  rpc.meros.live.recv()

  data: Data = Data(bytes(32), pubKey.to_bytes())
  for _ in range(2):
    data.sign(privKey)
    data.beat(spamFilter)
    rpc.meros.liveTransaction(data)
    rpc.meros.live.recv()

    verif: SignedVerification = SignedVerification(data.hash)
    verif.sign(0, PrivateKey(0))
    rpc.meros.signedElement(verif)
    rpc.meros.live.recv()

    data = Data(data.hash, bytes(2))

  privKey = ed25519.SigningKey(b'\1' * 32)
  pubKey = privKey.get_verifying_key()

  data = Data(bytes(32), pubKey.to_bytes())
  datas: List[Data] = []
  for i in range(5):
    data.sign(privKey)
    data.beat(spamFilter)
    datas.append(data)
    rpc.meros.liveTransaction(data)
    rpc.meros.live.recv()

    verif: SignedVerification = SignedVerification(data.hash)
    verif.sign(0, PrivateKey(0))
    rpc.meros.signedElement(verif)
    rpc.meros.live.recv()

    if i < 2:
      data = Data(data.hash, bytes(4))
    else:
      data = Data(data.hash, bytes(8))

  block = PrototypeBlock(2400, [VerificationPacket(data.hash, [0]) for data in datas]).finish(0, merit)
  merit.add(block)

  sleep(35)
  rpc.meros.liveConnect(merit.blockchain.blocks[0].header.hash)
  rpc.meros.syncConnect(merit.blockchain.blocks[0].header.hash)

  rpc.meros.liveBlockHeader(block.header)
  rpc.meros.sync.recv()
  rpc.meros.blockBody(block)
  rpc.meros.live.recv()
kayabaNerve commented 3 years ago

Fixing this should thankfully increase sketch resolution frequency. This only crashed because it deleted the wrong element in such a wrong way it broke a boundary. Deleting any element improperly is enough to cause resolution to fail. Now that they're being handled correctly...

Also adding a couple basic Sketch resolution tests based on this new test, used as basic checks during my work.