Closed Vyryn closed 3 years ago
Appears to be a problem with resolving the sketch, as the lack of logging in between "New Block" and "Panic" says it failed at the start of syncing it, AKA sketch resolution. This is likely because the RPC uses a custom Sketch in order to publish Blocks. Unfortunately, the logs don't contain enough info. I plan to add a log statement in the place, noting this issue, so we can hopefully get it next time.
That said, this is extremely problematic. Marking critical. Thank you for noting this. I'll update this when I add the log statement and ask participants to update their node.
Happened on a seed node which wasn't mining at all.
At some point in the sync function, we're creating a seq with a negative size. This was found due to @PlasmaPower. Thanks for that :)
Which seq is still up in the air; I'll look into it later today.
Apparently I forgot to update this.
The issue isn't that we're creating a seq with a negative size. The issue is we're deleting a negative index from a seq. That happens in this block.
#Iterate over the differences.
var
m: int = 0
offset: int = 0
while m < result.missing.len:
#If we have one of the differences, remove it from both packets and missing.
if sketch.hashes.hasKey(result.missing[m]):
try:
result.packets.delete(sketch.hashes[result.missing[m]] - offset)
except KeyError as e:
panic("Couldn't get the index a hash maps to despite checking with hasKey first: " & e.msg)
result.missing.delete(m)
inc(offset)
continue
inc(m)
This assumes missing is ordered the same as we entered packets; it isn't. Moving to a O(n) approach should fix this. That said, a test to replicate this will be needed first to absolutely confirm both the problem and the fix.
Repro'd with:
from typing import List
import ed25519
from e2e.Libs.BLS import PrivateKey
from e2e.Classes.Transactions.Data import Data
from e2e.Classes.Consensus.Verification import SignedVerification
from e2e.Classes.Consensus.VerificationPacket import VerificationPacket
from e2e.Classes.Consensus.SpamFilter import SpamFilter
from e2e.Classes.Merit.Merit import Merit, Block
from e2e.Meros.RPC import RPC
from e2e.Vectors.Generation.PrototypeChain import PrototypeBlock
def TwoHundredSixtyThreeTest(
rpc: RPC
) -> None:
privKey: ed25519.SigningKey = ed25519.SigningKey(b'\0' * 32)
pubKey: ed25519.VerifyingKey = privKey.get_verifying_key()
spamFilter: SpamFilter = SpamFilter(5)
merit: Merit = Merit()
rpc.meros.liveConnect(merit.blockchain.blocks[0].header.hash)
rpc.meros.syncConnect(merit.blockchain.blocks[0].header.hash)
block: Block = PrototypeBlock(1200, [], minerID=PrivateKey(0)).finish(0, merit)
merit.add(block)
rpc.meros.liveBlockHeader(block.header)
rpc.meros.sync.recv()
rpc.meros.blockBody(block)
rpc.meros.live.recv()
data: Data = Data(bytes(32), pubKey.to_bytes())
for _ in range(10):
data.sign(privKey)
data.beat(spamFilter)
rpc.meros.liveTransaction(data)
rpc.meros.live.recv()
verif: SignedVerification = SignedVerification(data.hash)
verif.sign(0, PrivateKey(0))
rpc.meros.signedElement(verif)
rpc.meros.live.recv()
data = Data(data.hash, bytes(1))
privKey = ed25519.SigningKey(b'\1' * 32)
pubKey = privKey.get_verifying_key()
data = Data(bytes(32), pubKey.to_bytes())
datas: List[Data] = []
for i in range(5):
data.sign(privKey)
data.beat(spamFilter)
datas.append(data)
rpc.meros.liveTransaction(data)
rpc.meros.live.recv()
verif: SignedVerification = SignedVerification(data.hash)
verif.sign(0, PrivateKey(0))
rpc.meros.signedElement(verif)
rpc.meros.live.recv()
data = Data(data.hash, bytes(1))
block = PrototypeBlock(2400, [VerificationPacket(data.hash, [0]) for data in datas]).finish(0, merit)
merit.add(block)
rpc.meros.liveBlockHeader(block.header)
rpc.meros.sync.recv()
rpc.meros.blockBody(block, 15)
rpc.meros.live.recv()
It uses a bogus capacity (it's technically valid; just larger than the data set) to ensure this is triggered. Since this is based on non-sorted data expected to be sorted, and Minisketch's internal ordering of the result, I didn't write a guaranteed-to-trigger MRE. Rather, I wrote the above, and tweaked numbers until I successfully hit this bug.
Triggered without unusual Sketch manipulation (to prove this could happen on a testnet, as it did):
from typing import List
from time import sleep
import ed25519
from e2e.Libs.BLS import PrivateKey
from e2e.Libs.Minisketch import Sketch
from e2e.Classes.Transactions.Data import Data
from e2e.Classes.Consensus.Verification import SignedVerification
from e2e.Classes.Consensus.VerificationPacket import VerificationPacket
from e2e.Classes.Consensus.SpamFilter import SpamFilter
from e2e.Classes.Merit.Merit import Merit, Block
from e2e.Meros.RPC import RPC
from e2e.Vectors.Generation.PrototypeChain import PrototypeBlock
def TwoHundredSixtyThreeTest(
rpc: RPC
) -> None:
privKey: ed25519.SigningKey = ed25519.SigningKey(b'\0' * 32)
pubKey: ed25519.VerifyingKey = privKey.get_verifying_key()
spamFilter: SpamFilter = SpamFilter(5)
merit: Merit = Merit()
rpc.meros.liveConnect(merit.blockchain.blocks[0].header.hash)
rpc.meros.syncConnect(merit.blockchain.blocks[0].header.hash)
block: Block = PrototypeBlock(1200, [], minerID=PrivateKey(0)).finish(0, merit)
merit.add(block)
rpc.meros.liveBlockHeader(block.header)
rpc.meros.sync.recv()
rpc.meros.blockBody(block)
rpc.meros.live.recv()
data: Data = Data(bytes(32), pubKey.to_bytes())
for _ in range(2):
data.sign(privKey)
data.beat(spamFilter)
rpc.meros.liveTransaction(data)
rpc.meros.live.recv()
verif: SignedVerification = SignedVerification(data.hash)
verif.sign(0, PrivateKey(0))
rpc.meros.signedElement(verif)
rpc.meros.live.recv()
data = Data(data.hash, bytes(2))
privKey = ed25519.SigningKey(b'\1' * 32)
pubKey = privKey.get_verifying_key()
data = Data(bytes(32), pubKey.to_bytes())
datas: List[Data] = []
for i in range(5):
data.sign(privKey)
data.beat(spamFilter)
datas.append(data)
rpc.meros.liveTransaction(data)
rpc.meros.live.recv()
verif: SignedVerification = SignedVerification(data.hash)
verif.sign(0, PrivateKey(0))
rpc.meros.signedElement(verif)
rpc.meros.live.recv()
if i < 2:
data = Data(data.hash, bytes(4))
else:
data = Data(data.hash, bytes(8))
block = PrototypeBlock(2400, [VerificationPacket(data.hash, [0]) for data in datas]).finish(0, merit)
merit.add(block)
sleep(35)
rpc.meros.liveConnect(merit.blockchain.blocks[0].header.hash)
rpc.meros.syncConnect(merit.blockchain.blocks[0].header.hash)
rpc.meros.liveBlockHeader(block.header)
rpc.meros.sync.recv()
rpc.meros.blockBody(block)
rpc.meros.live.recv()
Fixing this should thankfully increase sketch resolution frequency. This only crashed because it deleted the wrong element in such a wrong way it broke a boundary. Deleting any element improperly is enough to cause resolution to fail. Now that they're being handled correctly...
Also adding a couple basic Sketch resolution tests based on this new test, used as basic checks during my work.
This PANIC error crashed my node after about seven hours of stable running, on hash 65AAEF7473C95F9E70EA64793C5BEC67A1992F788D4025B73FAF8C86C34D0100
Log: