erigontech / erigon

Ethereum implementation on the efficiency frontier https://erigon.gitbook.io
GNU Lesser General Public License v3.0
3.12k stars 1.11k forks source link

Error running v2.57.3 from scratch: SyncStage, err: mdbx_cursor_put: MDBX_BAD_TXN: Transaction is not valid for requested operation #9440

Open krunkosaurus opened 8 months ago

krunkosaurus commented 8 months ago

Hello, I ran Erigon successfully a few years ago but i'm back with new hardware to try to run it again. I come across these errors I've been trying to work around but haven't successfully yet. I synced 600GB+ on the first night of running but have been on a standstill ever since. Please see logs

System information

Erigon version: ./erigon --version

erigon version 2.57.3 Macbook Pro M1 Max, 64GB ram, 4TB SSD

Commit hash:

erigon --datadir /Volumes/extreme/erigon --chain mainnet --private.api.addr=localhost:9090

Erigon Command (with flags/config):

Consensus Layer:

Consensus Layer Command (with flags/config):

Chain/Network: maiinet

Expected behaviour

To fully sync archive node from scratch

Actual behaviour

Days without syncing

Steps to reproduce the behaviour

Run above command

Backtrace

WARN[02-14|01:01:10.977] NAT ExternalIP resolution has failed, try to pass a different --nat option err="SOAP fault. Code: s:Client | Explanation: UPnPError | Detail: <UPnPError xmlns=\"urn:schemas-upnp-org:control-1-0\"><errorCode>501</errorCode><errorDescription>Action Failed</errorDescription></UPnPError>"
WARN[02-14|01:01:11.950] NAT ExternalIP resolution has failed, try to pass a different --nat option err="SOAP fault. Code: s:Client | Explanation: UPnPError | Detail: <UPnPError xmlns=\"urn:schemas-upnp-org:control-1-0\"><errorCode>501</errorCode><errorDescription>Action Failed</errorDescription></UPnPError>"
INFO[02-14|01:01:12.062] [snapshots:download] Blocks Stat         blocks=18800k indices=18800k alloc=3.8GB sys=4.6GB
INFO[02-14|01:01:12.062] [4/12 Execution] Blocks execution        from=4325169 to=18799999
WARN[02-14|01:01:25.261] [4/12 Execution] Execution failed        block=4328265 hash=0xb0094ce6434f7de824e8e6513f404873e3b36d8324480097645c70c51795560e err="invalid block: could not apply tx 44 from block 4328265 [0x975fc4bbaf080762969a3e8d4accb508161e18cb95ef3439e1f6b0412f862d42]: nonce too high: address 0x00C12FbD5E40ea5F73bdEcF80Cfa8d1ade104Eb6, tx: 5 state: 0"
EROR[02-14|01:01:25.265] Could not start execution service        err="[4/12 Execution] label: chaindata, table: SyncStage, err: mdbx_cursor_put: MDBX_BAD_TXN: Transaction is not valid for requested operation, e.g. had errored and be must aborted, has a child, or is invalid"
AskAlexSharov commented 8 months ago

where did you get erigon's binary? can you try build from sources?

krunkosaurus commented 8 months ago

Hello, I got the binary from the latest release page on Github here:https://github.com/ledgerwatch/erigon/releases/tag/v2.57.3Can try a new binary but do I need to delete the 600gb downloaded?On Feb 14, 2024, at 4:18 PM, Alex Sharov @.***> wrote: where did you get erigon's binary? can you try build from sources?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

AskAlexSharov commented 8 months ago

No

krunkosaurus commented 8 months ago

Thanks, surprisingly that helped me download another 60GB and then I hit this error:

WARN[02-16|15:44:07.087] NAT ExternalIP resolution has failed, try to pass a different --nat option err="SOAP fault. Code: s:Client | Explanation: UPnPError | Detail: <UPnPError xmlns=\"urn:schemas-upnp-org:control-1-0\"><errorCode>501</errorCode><errorDescription>Action Failed</errorDescription></UPnPError>"
WARN[02-16|15:44:08.051] NAT ExternalIP resolution has failed, try to pass a different --nat option err="SOAP fault. Code: s:Client | Explanation: UPnPError | Detail: <UPnPError xmlns=\"urn:schemas-upnp-org:control-1-0\"><errorCode>501</errorCode><errorDescription>Action Failed</errorDescription></UPnPError>"
INFO[02-16|15:44:08.122] [snapshots:download] Blocks Stat         blocks=18800k indices=18800k alloc=3.7GB sys=4.6GB
INFO[02-16|15:44:08.122] [4/12 Execution] Blocks execution        from=4899069 to=18799999
INFO[02-16|15:44:38.126] [4/12 Execution] Executed blocks         number=4904758 blk/s=189.6 tx/s=41340.1 Mgas/s=1455.9 gasState=0.08 batch=83.0MB alloc=2.7GB sys=5.2GB
INFO[02-16|15:45:08.125] [4/12 Execution] Executed blocks         number=4910591 blk/s=194.4 tx/s=36879.1 Mgas/s=1477.2 gasState=0.16 batch=180.9MB alloc=5.0GB sys=5.6GB
INFO[02-16|15:45:34.425] Committed State                          gas reached=126359613651 gasTarget=549755813888
EROR[02-16|15:45:34.460] Could not start execution service        err="[4/12 Execution] loadIntoTable Code: : put: k=39865883390de19d67b7664f2bdb718e7cacb802108b15c7076be519ffe08741, label: chaindata, table: Code, err: mdbx_cursor_put: MDBX_PAGE_NOTFOUND: Requested page not found"
INFO[02-16|15:47:07.066] [p2p] GoodPeers                          eth68=4 eth67=3 eth66=3
INFO[02-16|15:47:07.068] [txpool] stat                            pending=0 baseFee=0 queued=33 alloc=3.6GB sys=5.6GB

Could the NAT warnings cause a problem?

AskAlexSharov commented 8 months ago

no. MDBX_PAGE_NOTFOUND is likely about broken disk or broken ram. please use tools like https://www.memtest86.com to test RAM and tools like https://www.smartmontools.org to test Disk also you can try make db-tools -> ./build/bin/mdbx_chk --help datadir/chaindata

krunkosaurus commented 6 months ago

@AskAlexSharov Hello, it was suggested that my Macbook Pro M1 Max had memory issues so I bought a brand new Mac Mini M2 with 32GB ram and 4TB Sandisk SSD USB-C and I get the similar issue after 624GB downloaded:

mdbx_txn_begin: MDBX_CORRUPTED: Maybe free space is over on disk. Otherwise it's hardware failure. Before creating issue please use tools like https://www.memtest86.com to test RAM and tools like https://www.smartmontools.org to test Disk. To handle hardware risks: use ECC RAM, use RAID of disks, run multiple application instances (or do backups). If hardware checks passed - check FS settings - 'fsync' and 'flock' must be enabled.  Otherwise - please create issue in Application repo. On default DURABLE mode, power outage can't cause this error. On other modes - power outage may break last transaction and mdbx_chk can recover db in this case, see '-t' and '-0|1|2' options., label: txpool, trace: [kv_mdbx.go:358 all_components.go:123 backend.go:602 node.go:124 main.go:65 make_app.go:52 command.go:279 app.go:337 app.go:311 main.go:34 proc.go:271 asm_arm64.s:1222]

Please let me know what else I should try.

AskAlexSharov commented 6 months ago

label: txpool remove datadir/txpool