Closed kayabaNerve closed 4 years ago
I did check if this is a re-org bug by generating a vector/test (LongerChainMoreWork with keepUnlocked = true). Both when the TX had already finalized, and was still in Epochs, the test passed completely.
The Block in question was actually added, with currentHeight 33. It will be called Block A. Meros then added two more Blocks, and then at currentHeight 36, a Block which triggered #233 (which has an improper title, as discovered during the handling of the issue) was sent. Then a 7 Block reorg occurred. Then said reorg failed due to a different Block which triggered #233. Then resyncing of the original chain triggered #228 thanks to Block A. On reboot, Block A was synced once again, and then the bug described in this issue finally occurred.
This should be able to be boiled down to:
I'm going to rebuild the DT5 code and try to reproduce this. When I do, I can get the debug info needed to verify there isn't a problem which remains in the codebase.
Reproduced on the DT5 tag with the following script:
from subprocess import Popen
from time import sleep
from e2e.Classes.Merit.Blockchain import Block, Blockchain
from e2e.Vectors.Generation.PrototypeChain import PrototypeChain
from e2e.Meros.RPC import RPC
protoMain: PrototypeChain = PrototypeChain(2)
protoAlt: PrototypeChain = PrototypeChain(1)
protoAlt.timeOffset = protoAlt.timeOffset + 1
for _ in range(2):
protoAlt.add()
main: Blockchain = protoMain.finish()
alt: Blockchain = protoAlt.finish()
def TwoHundredThirtyTwoTest(
rpc: RPC
) -> None:
def connect() -> None:
rpc.meros.liveConnect(main.blocks[0].header.hash)
rpc.meros.syncConnect(main.blocks[0].header.hash)
def sendBlock(
toSend: Block
) -> None:
rpc.meros.liveBlockHeader(toSend.header)
rpc.meros.sync.recv()
rpc.meros.blockBody(toSend)
if toSend.body.packets:
rpc.meros.sync.recv()
rpc.meros.packet(toSend.body.packets[0])
connect()
sendBlock(main.blocks[1])
sendBlock(main.blocks[2])
rpc.meros.liveBlockHeader(alt.blocks[3].header)
rpc.meros.sync.recv()
rpc.meros.blockList([alt.blocks[2].header.hash, alt.blocks[1].header.hash])
rpc.meros.sync.recv()
rpc.meros.syncBlockHeader(alt.blocks[2].header)
rpc.meros.live.connection.close()
rpc.meros.sync.connection.close()
sleep(35)
connect()
sendBlock(main.blocks[2])
while rpc.meros.process.poll() is None:
pass
rpc.meros.process = Popen(["./build/Meros", "--data-dir", "./data/e2e", "--log-file", "TwoHundredThirtyTwoTest.log", "--db", "TwoHundredThirtyTwoTest", "--network", "devnet", "--tcp-port", str(rpc.meros.tcp), "--rpc-port", str(rpc.meros.rpc), "--no-gui"])
sleep(10)
connect()
sendBlock(main.blocks[2])
sleep(5)
Found the problem! When reverting, the DB is saved to BEFORE postRevert is called. This is required, for some reason which I don't remember off the top of my head. That said, we NEED to call commit after as well.
Adding this missing line ensures consistent behavior, as now it again crashes due to #228, as it should.
Reproducible on master via:
from subprocess import Popen
from time import sleep
from e2e.Classes.Merit.Blockchain import Block, Blockchain
from e2e.Vectors.Generation.PrototypeChain import PrototypeChain
from e2e.Meros.RPC import RPC
protoMain: PrototypeChain = PrototypeChain(2)
protoAlt: PrototypeChain = PrototypeChain(1)
protoAlt.timeOffset = protoAlt.timeOffset + 1
for _ in range(2):
protoAlt.add()
main: Blockchain = protoMain.finish()
alt: Blockchain = protoAlt.finish()
def TwoHundredThirtyTwoTest(
rpc: RPC
) -> None:
def connect() -> None:
rpc.meros.liveConnect(main.blocks[0].header.hash)
rpc.meros.syncConnect(main.blocks[0].header.hash)
def sendBlock(
toSend: Block
) -> None:
rpc.meros.liveBlockHeader(toSend.header)
rpc.meros.sync.recv()
rpc.meros.blockBody(toSend)
if toSend.body.packets:
rpc.meros.sync.recv()
rpc.meros.packet(toSend.body.packets[0])
connect()
sendBlock(main.blocks[1])
sendBlock(main.blocks[2])
rpc.meros.liveBlockHeader(alt.blocks[3].header)
rpc.meros.sync.recv()
rpc.meros.blockList([alt.blocks[2].header.hash, alt.blocks[1].header.hash])
rpc.meros.sync.recv()
rpc.meros.syncBlockHeader(alt.blocks[2].header)
rpc.meros.live.connection.close()
rpc.meros.sync.connection.close()
sleep(35)
rpc.meros.quit()
rpc.meros.process = Popen(["./build/Meros", "--data-dir", "./data/e2e", "--log-file", "TwoHundredThirtyTwoTest.log", "--db", "TwoHundredThirtyTwoTest", "--network", "devnet", "--tcp-port", str(rpc.meros.tcp), "--rpc-port", str(rpc.meros.rpc), "--no-gui"])
sleep(10)
connect()
sendBlock(main.blocks[2])
sleep(5)
The reason provided is "Block archives holders who are already archived." Reading over the DB JSON, the Block is perfectly valid. db.txt (db.json) has the Block added, and it's valid. seed.log has the error, shortly after a fatal crash (from #228).
db.txt seed.log