Closed hanzhenlong1314 closed 3 months ago
continue with error message recommendations:
make db-tools
mdbx_chk may recover db. see '-t' and '-0|1|2' options. continue with error message recommendations:
- [x] check free space
- [] maybe hardware failure. please use tools like https://www.memtest86.com/ to test RAM and tools like https://www.smartmontools.org/ to test Disk if hardware checks will pass
- []
make db-tools
mdbx_chk may recover db. see '-t' and '-0|1|2' options.- also double-check that fsync is not disabled on server
,I found that I cannot start the system with snapshots, but I can start it normally without snapshots. Is there something wrong with the snapshots?
cannot start
- totally not enough information.
annot start
- totally not enough information. I checked the free disk space and memory space and they are both sufficient.
what means cannot start
? error message? logs? it suck? or what?
Even though the memory and disk are sufficient, the startup error is: meta_checktxnid:11415 catch invalid root_page_txnid 11557706 for maindb.mod_txnid 24300513 (workaround for incoherent flaw of unified page/buffer cache) meta_waittxnid:11454 bailout waiting for valid snapshot (workaround for incoherent flaw of unified page/buffer cache) mdbx_setup_dxb:16208 error -30796, while updating meta.geo: from l3-n749199549-u939524096/s2048-g1024 (txn#24300516), to l3-n749199549-u1006632960/s2048-g1024 (txn#24300517) [EROR] [06-19|03:40:14.411] Erigon startup err="mdbx_env_open: MDBX_CORRUPTED: Maybe free space is over on disk. Otherwise it's hardware failure. Before creating issue please use tools like https://www.memtest86.com/ to test RAM and tools like https://www.smartmontools.org/ to test Disk. To handle hardware risks: use ECC RAM, use RAID of disks, run multiple application instances (or do backups). If hardware checks passed - check FS settings - 'fsync' and 'flock' must be enabled. Otherwise - please create issue in Application repo. On default DURABLE mode, power outage can't cause this error. On other modes - power outage may break last transaction and mdbx_chk can recover db in this case, see '-t' and '-0|1|2' options., label: chaindata, trace: [kv_mdbx.go:357 node.go:367 node.go:370 backend.go:245 node.go:124 main.go:66 make_app.go:54 command.go:276 app.go:333 app.go:307 main.go:34 proc.go:267 asm_amd64.s:1650]" mdbx_env_open: MDBX_CORRUPTED: Maybe free space is over on disk. Otherwise it's hardware failure. Before creating issue please use tools like https://www.memtest86.com/ to test RAM and tools like https://www.smartmontools.org/ to test Disk. To handle hardware risks: use ECC RAM, use RAID of disks, run multiple application instances (or do backups). If hardware checks passed - check FS settings - 'fsync' and 'flock' must be enabled. Otherwise - please create issue in Application repo. On default DURABLE mode, power outage can't cause this error. On other modes - power outage may break last transaction and mdbx_chk can recover db in this case, see '-t' and '-0|1|2' options., label: chaindata, trace: [kv_mdbx.go:357 node.go:367 node.go:370 backend.go:245 node.go:124 main.go:66 make_app.go:54 command.go:276 app.go:333 app.go:307 main.go:34 proc.go:267 asm_amd64.s:1650]
maybe hardware failure. please use tools like https://www.memtest86.com/ to test RAM and tools like https://www.smartmontools.org/ to test Disk
maybe hardware failure. please use tools like https://www.memtest86.com/ to test RAM and tools like https://www.smartmontools.org/ to test Disk
./memtester 100G 10 memtester version 4.5.1 (64-bit) Copyright (C) 2001-2020 Charles Cazabon. Licensed under the GNU General Public License version 2 (only).
pagesize is 4096 pagesizemask is 0xfffffffffffff000 want 102400MB (107374182400 bytes) got 102400MB (107374182400 bytes), trying mlock ...locked. Loop 1/10: I tested the memory pagesize to be 4kb. Will this affect the node startup? When I changed db.pagesize=4kb, the node reported another error, erigon/data/ ok-erigon --chain=bor-mainnet --bor.heimdall=https://heimdall-api.polygon.technology --http.addr=0.0.0.0 --http.vhosts= --http.corsdomain= --http.api=eth,erigon,engine,debug,trace --db.size.limit=13TB --db.pagesize=4kb --datadir=/root/erigon/data/ --torrent.download.rate=512mb [root@c01_docker_solfullnode_pap_hk poly-archive]# docker logs -f ok-erigon [INFO] [06-25|08:31:57.497] logging to file system log dir=/root/erigon/data/logs file prefix=erigon log level=info json=false [INFO] [06-25|08:31:57.497] Build info git_branch= git_tag= git_commit= [INFO] [06-25|08:31:57.497] Starting Erigon on Bor Mainnet... [INFO] [06-25|08:31:57.498] Maximum peer count ETH=100 total=100 [INFO] [06-25|08:31:57.498] starting HTTP APIs port=8545 APIs=eth,erigon,engine,debug,trace [INFO] [06-25|08:31:57.498] torrent verbosity level=WRN [INFO] [06-25|08:31:59.601] Set global gas cap cap=50000000 [INFO] [06-25|08:31:59.602] [Downloader] Running with ipv6-enabled=true ipv4-enabled=true download.rate=512mb upload.rate=4mb [INFO] [06-25|08:31:59.602] Opening Database label=chaindata path=/root/erigon/data/chaindata [EROR] [06-25|08:31:59.602] Erigon startup err="mdbx_env_set_geometry: MDBX_TOO_LARGE: Database is too large for current system, e.g. could NOT be mapped into RAM" mdbx_env_set_geometry: MDBX_TOO_LARGE: Database is too large for current system, e.g. could NOT be mapped into RAM @AskAlexSharov
4kb pagesize can maximum address 8tb db. so, --db.size.limit=13TB
is too much for 4kb pagesize. set it smaller.
4kb pagesize can maximum address 8tb db. so,
--db.size.limit=13TB
is too much for 4kb pagesize. set it smaller.
So I should change the memory pagesize to 8kb, instead of simply setting db.pagesize=8kb, right? @AskAlexSharov
4kb pagesize can maximum address 8tb db. so,
--db.size.limit=13TB
is too much for 4kb pagesize. set it smaller.
/data2/erigon/build/bin/integration mdbx_to_mdbx --datadir /data1/erigon_temp --chaindata /data2/poly-archive/erigon_data --chaindata.to /data1/poly/chaindata/ INFO[06-27|14:21:09.692] logging to file system log dir=/data1/erigon_temp/logs file prefix=integration log level=info json=false panic: fail to open mdbx: mdbx_txn_begin: MDBX_PROBLEM: Unexpected internal error, transaction should be aborted, label: chaindata, trace: [kv_mdbx.go:369 kv_mdbx.go:475 backup.go:33 refetence_db.go:120 command.go:987 command.go:1115 command.go:1039 command.go:1032 main.go:18 proc.go:267 asm_amd64.s:1650]
goroutine 1 [running]: github.com/ledgerwatch/erigon-lib/kv/mdbx.MdbxOpts.MustOpen({{0x1e3d520, 0xc000844fa0}, 0xc000b35e50, 0xc0014f80d0, {0x7ffd8c41f0b1, 0x1f}, 0x0, 0x20000000000, 0x40000000, 0xffffffffffffffff, ...}) github.com/ledgerwatch/erigon-lib@v0.0.0-00010101000000-000000000000/kv/mdbx/kv_mdbx.go:477 +0xc5 github.com/ledgerwatch/erigon-lib/kv/backup.OpenPair({0x7ffd8c41f0b1, 0x1f}, {0x7ffd8c41f0e0, 0x16}, 0x0, 0x0, {0x1e3d520, 0xc000844fa0}) github.com/ledgerwatch/erigon-lib@v0.0.0-00010101000000-000000000000/kv/backup/backup.go:33 +0x29c github.com/ledgerwatch/erigon/cmd/integration/commands.glob..func5(0xc000691200?, {0x199afa6?, 0x4?, 0x199ae3e?}) github.com/ledgerwatch/erigon/cmd/integration/commands/refetence_db.go:120 +0x8c github.com/spf13/cobra.(Command).execute(0x2a8e000, {0xc0014d40c0, 0x6, 0x6}) github.com/spf13/cobra@v1.8.0/command.go:987 +0xaa3 github.com/spf13/cobra.(Command).ExecuteC(0x2a8e5c0) github.com/spf13/cobra@v1.8.0/command.go:1115 +0x3ff github.com/spf13/cobra.(Command).Execute(...) github.com/spf13/cobra@v1.8.0/command.go:1039 github.com/spf13/cobra.(Command).ExecuteContext(0x462d9c?, {0x1e334d0?, 0xc000b32af0?}) github.com/spf13/cobra@v1.8.0/command.go:1032 +0x47 main.main() github.com/ledgerwatch/erigon/cmd/integration/main.go:18 +0xe6
There is a problem with the copy data, please help me
if need more then 8Tb database - then can't use 4kb pagesize. use 8kb or more (re-create target db)
also can format logs and shell output by using triple backticks: https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-code
if need more then 8Tb database - then can't use 4kb pagesize. use 8kb or more (re-create target db)
I have understood this problem. I have changed db.pagesize=8KB, but the following error occurred when executing snapshot data copy. Integration mdbx_to_mdbx has an error
Running version: 2.60.0 System: Linux c01_docker_solfullnode_pap_hk 5.10.134-16.3.al8.x86_64 #1 SMP Tue Mar 26 18:54:05 CST 2024 x86_64 x86_64 x86_64 GNU/Linux Disk size: 20T Memory: 250G @AskAlexSharov
data2/erigon/build/bin/integration mdbx_to_mdbx --datadir /data1/erigon_temp --chaindata /data2/poly-archive/erigon_data --chaindata.to /data1/poly/chaindata/ INFO[06-27|14:21:09.692] logging to file system log dir=/data1/erigon_temp/logs file prefix=integration log level=info json=false panic: fail to open mdbx: mdbx_txn_begin: MDBX_PROBLEM: Unexpected internal error, transaction should be aborted, label: chaindata, trace: [kv_mdbx.go:369 kv_mdbx.go:475 backup.go:33 refetence_db.go:120 command.go:987 command.go:1115 command.go:1039 command.go:1032 main.go:18 proc.go:267 asm_amd64.s:1650]
goroutine 1 [running]: github.com/ledgerwatch/erigon-lib/kv/mdbx.MdbxOpts.MustOpen({{0x1e3d520, 0xc000844fa0}, 0xc000b35e50, 0xc0014f80d0, {0x7ffd8c41f0b1, 0x1f}, 0x0, 0x20000000000, 0x40000000, 0xffffffffffffffff, ...}) github.com/ledgerwatch/erigon-lib@v0.0.0-00010101000000-000000000000/kv/mdbx/kv_mdbx.go:477 +0xc5 github.com/ledgerwatch/erigon-lib/kv/backup.OpenPair({0x7ffd8c41f0b1, 0x1f}, {0x7ffd8c41f0e0, 0x16}, 0x0, 0x0, {0x1e3d520, 0xc000844fa0}) github.com/ledgerwatch/erigon-lib@v0.0.0-00010101000000-000000000000/kv/backup/backup.go:33 +0x29c github.com/ledgerwatch/erigon/cmd/integration/commands.glob..func5(0xc000691200?, {0x199afa6?, 0x4?, 0x199ae3e?}) github.com/ledgerwatch/erigon/cmd/integration/commands/refetence_db.go:120 +0x8c github.com/spf13/cobra.(Command).execute(0x2a8e000, {0xc0014d40c0, 0x6, 0x6}) github.com/spf13/cobra@v1.8.0/command.go:987 +0xaa3 github.com/spf13/cobra.(Command).ExecuteC(0x2a8e5c0) github.com/spf13/cobra@v1.8.0/command.go:1115 +0x3ff github.com/spf13/cobra.(Command).Execute(...) github.com/spf13/cobra@v1.8.0/command.go:1039 github.com/spf13/cobra.(Command).ExecuteContext(0x462d9c?, {0x1e334d0?, 0xc000b32af0?}) github.com/spf13/cobra@v1.8.0/command.go:1032 +0x47 main.main() github.com/ledgerwatch/erigon/cmd/integration/main.go:18 +0xe6
There is a problem with the copy data, please help me
this seems to be for copying data from new and old nodes, not for snapshot import. The snapshot only has one file mdbx.dat. Is this the reason for the execution failure? @AskAlexSharov
try take a look if both db's are fine. for example by:
mdbx_stat -ef /data1/poly/chaindata/
mdbx_stat -ef /data2/poly-archive/erigon_data
@AskAlexSharov ./mdbx_stat -ef /data1/poly/chaindata/ mdbx_stat v0.12.9-16-gfff3fbd8 (2024-03-06T22:58:31+03:00, T-c5e6e3a4f75727b9e0039ad420ae167d3487d006) Running for /data1/poly/chaindata/... ./mdbx_stat: mdbx_env_open() error -30794 MDBX_VERSION_MISMATCH: DB version mismatch libmdbx
[root@c01_docker_solfullnode_pap_hk bin]# ./mdbx_stat -ef /data2/poly-archive/erigon_data mdbx_stat v0.12.9-16-gfff3fbd8 (2024-03-06T22:58:31+03:00, T-c5e6e3a4f75727b9e0039ad420ae167d3487d006) Running for /data2/poly-archive/erigon_data... ./mdbx_stat: mdbx_txn_begin() error -30779 MDBX_PROBLEM: Unexpected internal error, transaction should be aborted
git --no-pager log -1 --oneline
make db-tools
du -h /data1/poly/chaindata/
./build/bin/mdbx_stat -ef /data1/poly/chaindata/
du -h /data2/poly-archive/erigon_data
./build/bin/mdbx_stat -ef /data2/poly-archive/erigon_data
./build/bin/mdbx_chk -0 -d /data2/poly-archive/erigon_data
./build/bin/mdbx_chk -1 -d /data2/poly-archive/erigon_data
./build/bin/mdbx_chk -2 -d /data2/poly-archive/erigon_data
plz use triple backticks for output formatting: https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-code
git --no-pager log -1 --oneline make db-tools du -h /data1/poly/chaindata/ ./build/bin/mdbx_stat -ef /data1/poly/chaindata/ du -h /data2/poly-archive/erigon_data ./build/bin/mdbx_stat -ef /data2/poly-archive/erigon_data ./build/bin/mdbx_chk -0 -d /data2/poly-archive/erigon_data ./build/bin/mdbx_chk -1 -d /data2/poly-archive/erigon_data ./build/bin/mdbx_chk -2 -d /data2/poly-archive/erigon_data
plz use triple backticks for output formatting: https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-code
1、 du -h /data2/poly-archive/erigon_data 12T /data2/poly-archive/erigon_data
2、./build/bin/mdbx_stat -ef /data2/poly-archive/erigon_data mdbx_stat v0.12.9-16-gfff3fbd8 (2024-03-06T22:58:31+03:00, T-c5e6e3a4f75727b9e0039ad420ae167d3487d006) Running for /data2/poly-archive/erigon_data... ./build/bin/mdbx_stat: mdbx_txn_begin() error -30779 MDBX_PROBLEM: Unexpected internal error, transaction should be aborted
3、./build/bin/mdbx_chk -0 -d /data2/poly-archive/erigon_data mdbx_chk v0.12.9-16-gfff3fbd8 (2024-03-06T22:58:31+03:00, T-c5e6e3a4f75727b9e0039ad420ae167d3487d006) Running for /data2/poly-archive/erigon_data in 'read-only' mode... ! mdbx_txn_begin() failed, error -30779 MDBX_PROBLEM: Unexpected internal error, transaction should be aborted
4、./build/bin/mdbx_chk -1 -d /data2/poly-archive/erigon_data mdbx_chk v0.12.9-16-gfff3fbd8 (2024-03-06T22:58:31+03:00, T-c5e6e3a4f75727b9e0039ad420ae167d3487d006) Running for /data2/poly-archive/erigon_data in 'read-only' mode... ! mdbx_txn_begin() failed, error -30779 MDBX_PROBLEM: Unexpected internal error, transaction should be aborted
5、./build/bin/mdbx_chk -2 -d /data2/poly-archive/erigon_data mdbx_chk v0.12.9-16-gfff3fbd8 (2024-03-06T22:58:31+03:00, T-c5e6e3a4f75727b9e0039ad420ae167d3487d006) Running for /data2/poly-archive/erigon_data in 'read-only' mode... ! mdbx_txn_begin() failed, error -30779 MDBX_PROBLEM: Unexpected internal error, transaction should be aborted
./build/bin/mdbx_chk -vvv /data2/poly-archive/erigon_data
if it will return same error, can try:
git pull
git checkout e35_mdbx_v0_13
make db-tools
./build/bin/mdbx_chk -vvv /data2/poly-archive/erigon_data
if this command doesn't return anything new - then something wrong with your database. maybe you did backup it wrong way (without shutting down erigon). maybe your hardware is broken: can use tools like https://www.memtest86.com/ to test RAM and tools like https://www.smartmontools.org/ to test Disk
don't forget to git checkout release/2.60
back
./build/bin/mdbx_chk -vvv /data2/poly-archive/erigon_data
if it will return same error, can try:
git pull git checkout e35_mdbx_v0_13 make db-tools ./build/bin/mdbx_chk -vvv /data2/poly-archive/erigon_data
if this command doesn't return anything new - then something wrong with your database. maybe you did backup it wrong way (without shutting down erigon). maybe your hardware is broken: can use tools like https://www.memtest86.com/ to test RAM and tools like https://www.smartmontools.org/ to test Disk
don't forget to
git checkout release/2.60
back
yes,[root@c01_docker_solfullnode_pap_hk erigon]# git branch
./build/bin/mdbx_chk -vvv /data2/poly-archive/erigon_data
if it will return same error, can try:
git pull git checkout e35_mdbx_v0_13 make db-tools ./build/bin/mdbx_chk -vvv /data2/poly-archive/erigon_data
if this command doesn't return anything new - then something wrong with your database. maybe you did backup it wrong way (without shutting down erigon). maybe your hardware is broken: can use tools like https://www.memtest86.com/ to test RAM and tools like https://www.smartmontools.org/ to test Disk don't forget to
git checkout release/2.60
backyes,[root@c01_docker_solfullnode_pap_hk erigon]# git branch
(头指针在 v2.60.0 分离) main [root@c01_docker_solfullnode_pap_hk erigon]# [root@c01_docker_solfullnode_pap_hk erigon]# [root@c01_docker_solfullnode_pap_hk erigon]# [root@c01_docker_solfullnode_pap_hk erigon]# ./build/bin/mdbx_stat -ef /data2/poly-archive/erigon_data mdbx_stat v0.12.0-71-g1cac6536 (2022-07-28T09:57:31+07:00, T-9a6d7e5b917e5fbd14dc51835fa749d092aa1d72) Running for /data2/poly-archive/erigon_data... ./build/bin/mdbx_stat: mdbx_txn_begin() error -30796 MDBX_CORRUPTED: Database is corrupted
[root@c01_docker_solfullnode_pap_hk erigon]# ./build/bin/mdbx_chk -0 -d /data2/poly-archive/erigon_data mdbx_chk v0.12.0-71-g1cac6536 (2022-07-28T09:57:31+07:00, T-9a6d7e5b917e5fbd14dc51835fa749d092aa1d72) Running for /data2/poly-archive/erigon_data in 'read-only' mode... ! bailout waiting for valid snapshot (workaround for incoherent flaw of unified page/buffer cache) ! mdbx_txn_begin() failed, error -30796 MDBX_CORRUPTED: Database is corrupted
[root@c01_docker_solfullnode_pap_hk erigon]# [root@c01_docker_solfullnode_pap_hk erigon]# ./build/bin/mdbx_chk -1 -d /data2/poly-archive/erigon_data mdbx_chk v0.12.0-71-g1cac6536 (2022-07-28T09:57:31+07:00, T-9a6d7e5b917e5fbd14dc51835fa749d092aa1d72) Running for /data2/poly-archive/erigon_data in 'read-only' mode... ! bailout waiting for valid snapshot (workaround for incoherent flaw of unified page/buffer cache) ! mdbx_txn_begin() failed, error -30796 MDBX_CORRUPTED: Database is corrupted
[root@c01_docker_solfullnode_pap_hk erigon]# ./build/bin/mdbx_chk -2 -d /data2/poly-archive/erigon_data mdbx_chk v0.12.0-71-g1cac6536 (2022-07-28T09:57:31+07:00, T-9a6d7e5b917e5fbd14dc51835fa749d092aa1d72) Running for /data2/poly-archive/erigon_data in 'read-only' mode... ! bailout waiting for valid snapshot (workaround for incoherent flaw of unified page/buffer cache) ! mdbx_txn_begin() failed, error -30796 MDBX_CORRUPTED: Database is corrupted
Hey, when you mean snapshot
, you mean the ones provided by polygon? where do you download the snapshot from?
Hey, when you mean
snapshot
, you mean the ones provided by polygon? where do you download the snapshot from?MDBX
Yes, after the snapshot is unzipped, there is only one mdbx.dat file
Hey, @mh0lt can you ping the Polygon guys on this specific issue? it is not an Erigon problem. leaving this issue open until we receive a response from the mantainers of those snapshots but this is not an Erigon issue
please rise question in #polygon channel on Erigon's discord server https://github.com/erigontech/erigon?tab=readme-ov-file#erigon-discord-server - or at some Polygon support channel. we don't control polygon's snapshot files.
log:
meta_checktxnid:11415 catch invalid root_page_txnid 11557706 for maindb.mod_txnid 24300513 (workaround for incoherent flaw of unified page/buffer cache) meta_waittxnid:11454 bailout waiting for valid snapshot (workaround for incoherent flaw of unified page/buffer cache) mdbx_setup_dxb:16208 error -30796, while updating meta.geo: from l3-n749199549-u939524096/s2048-g1024 (txn#24300516), to l3-n749199549-u1006632960/s2048-g1024 (txn#24300517) [EROR] [06-19|03:40:14.411] Erigon startup err="mdbx_env_open: MDBX_CORRUPTED: Maybe free space is over on disk. Otherwise it's hardware failure. Before creating issue please use tools like https://www.memtest86.com/ to test RAM and tools like https://www.smartmontools.org/ to test Disk. To handle hardware risks: use ECC RAM, use RAID of disks, run multiple application instances (or do backups). If hardware checks passed - check FS settings - 'fsync' and 'flock' must be enabled. Otherwise - please create issue in Application repo. On default DURABLE mode, power outage can't cause this error. On other modes - power outage may break last transaction and mdbx_chk can recover db in this case, see '-t' and '-0|1|2' options., label: chaindata, trace: [kv_mdbx.go:357 node.go:367 node.go:370 backend.go:245 node.go:124 main.go:66 make_app.go:54 command.go:276 app.go:333 app.go:307 main.go:34 proc.go:267 asm_amd64.s:1650]" mdbx_env_open: MDBX_CORRUPTED: Maybe free space is over on disk. Otherwise it's hardware failure. Before creating issue please use tools like https://www.memtest86.com/ to test RAM and tools like https://www.smartmontools.org/ to test Disk. To handle hardware risks: use ECC RAM, use RAID of disks, run multiple application instances (or do backups). If hardware checks passed - check FS settings - 'fsync' and 'flock' must be enabled. Otherwise - please create issue in Application repo. On default DURABLE mode, power outage can't cause this error. On other modes - power outage may break last transaction and mdbx_chk can recover db in this case, see '-t' and '-0|1|2' options., label: chaindata, trace: [kv_mdbx.go:357 node.go:367 node.go:370 backend.go:245 node.go:124 main.go:66 make_app.go:54 command.go:276 app.go:333 app.go:307 main.go:34 proc.go:267 asm_amd64.s:1650]
commond: docker run -d --name ok-erigon -u root -p 7011:30303 -p 7012:8545 -p 7013:9090 -v /data4/poly:/root/erigon/data/ ok-erigon --chain=bor-mainnet --bor.heimdall=https://heimdall-api.polygon.technology/ --http.addr=0.0.0.0 --http.vhosts= --http.corsdomain= --http.api=eth,erigon,engine,debug,trace --db.size.limit=15TB --datadir=/root/erigon/data/ --torrent.download.rate=512mb
I am using this snapshot, unzip and replace the mdbx.dat in chaindata