Closed mindstyle85 closed 3 months ago
Getting the same issue after restoration from the snapshot and without explicit --node.store
, and as I recall I didn't see it before after the bridge was initialized and synced from zero.
And adding the --node.store
flag doesn't fix the problem
Tried also to put WorkingDirectory=
into systemd with no luck. Seems like the existing snapshot has some indexing of the store path and tries to work with those files even the store path is overwritten
node version: v0.13.6
node type: bridge
network: mocha-4
Nodes synced from 0 height with a custom --node.store
path are running without any problem.
This only happens after restoring from a snapshot.
Just reproduced:
/home/mzonder/.celestia-full-mocha-4/
celestia full init --p2p.network mocha-4 --node.store /home/mzonder/full-node/.celestia-full-mocha-4 ...
celestia full start --node.store /home/mzonder/full-node/.celestia-full-mocha-4 --node.config /home/mzonder/full-node/.celestia-full-mocha-4/config.toml
ERROR share/eds eds/store.go:199 removing shard after failure
attempLogs of this moment:
2024-05-30T14:18:17.743Z INFO header/store store/store.go:375 new head {"height": 1938724, "hash": "3A70C118301E105275522FE74BA4227FABFF415558B5D93AC26145D5A15E9E14"}
2024-05-30T14:18:17.763Z INFO das das/worker.go:161 sampled header {"type": "recent", "height": 1938724, "hash": "3A70C118301E105275522FE74BA4227FABFF415558B5D93AC26145D5A15E9E14", "square width": 32, "data root": "382787D61ADC29BDF867AE399BC6D68922F652E4C510F0BCF94E9F9BADF36B71", "finished (s)": 0.019159128}
2024-05-30T14:18:17.863Z INFO pidstore pidstore/pidstore.go:84 Persisted peers successfully {"amount": 87}
2024-05-30T14:18:22.779Z INFO bitswap-server server/server.go:547 Bitswap Client ReceiveError: Application error 0x0 (remote)
2024-05-30T14:18:29.432Z INFO header/store store/store.go:375 new head {"height": 1938725, "hash": "EBC6904307023B55AFE0C56F3C3E135549FD6028FD0621C22173C213537036EE"}
2024-05-30T14:18:29.457Z INFO das das/worker.go:161 sampled header {"type": "recent", "height": 1938725, "hash": "EBC6904307023B55AFE0C56F3C3E135549FD6028FD0621C22173C213537036EE", "square width": 32, "data root": "6AEBBE03F041138B9C156E254256BA1EFE89EF82F5BA9D0617C0DA6E9C5D1857", "finished (s)": 0.025673256}
2024-05-30T14:18:31.937Z INFO das das/store.go:60 stored checkpoint to disk: SampleFrom: 1938726, NetworkHead: 1938725
2024-05-30T14:18:37.976Z INFO bitswap-server server/server.go:547 Bitswap Client ReceiveError: Application error 0x0 (remote)
2024-05-30T14:18:41.188Z INFO header/store store/store.go:375 new head {"height": 1938726, "hash": "3A2C01F72D6AFFE1B788584DEA45361678DFA8CE2A22B35126BA08CB7CBE6B9B"}
2024-05-30T14:18:41.280Z INFO das das/worker.go:161 sampled header {"type": "recent", "height": 1938726, "hash": "3A2C01F72D6AFFE1B788584DEA45361678DFA8CE2A22B35126BA08CB7CBE6B9B", "square width": 32, "data root": "223C1F1229C6B751CC503B1851A4F0CAC5ED8BC5C6DB0381628C32B5EA8A0C30", "finished (s)": 0.092379655}
2024-05-30T14:18:43.232Z INFO bitswap-server server/server.go:547 Bitswap Client ReceiveError: Application error 0x0 (remote)
2024-05-30T14:18:46.267Z INFO bitswap-server server/server.go:547 Bitswap Client ReceiveError: Application error 0x0 (remote)
2024-05-30T14:18:52.920Z INFO header/store store/store.go:375 new head {"height": 1938727, "hash": "EB455DB3CB3FEBC870C840EC62576D95AB6896A8D8A24E7516226E1BC44C8D56"}
2024-05-30T14:18:52.943Z INFO das das/worker.go:161 sampled header {"type": "recent", "height": 1938727, "hash": "EB455DB3CB3FEBC870C840EC62576D95AB6896A8D8A24E7516226E1BC44C8D56", "square width": 32, "data root": "0958396F6244A7A2D6471620816AB48A8E92A45FC714D20452DB9A0727D9D69E", "finished (s)": 0.022327369}
2024-05-30T14:18:54.234Z INFO bitswap-server server/server.go:547 Bitswap Client ReceiveError: Application error 0x0 (remote)
2024-05-30T14:19:03.382Z INFO bitswap-server server/server.go:547 Bitswap Client ReceiveError: Application error 0x0 (remote)
2024-05-30T14:19:04.606Z INFO header/store store/store.go:375 new head {"height": 1938728, "hash": "A3F719895405149824135AAA8F4539710F9ED70572E13887EB7BC05253A75675"}
2024-05-30T14:19:04.787Z INFO das das/worker.go:161 sampled header {"type": "recent", "height": 1938728, "hash": "A3F719895405149824135AAA8F4539710F9ED70572E13887EB7BC05253A75675", "square width": 64, "data root": "FB0D0BDA1ACD289A4DE9C3552245DC38B7D05FE634FF6F42AFECCC0815A046AC", "finished (s)": 0.180223214}
2024-05-30T14:19:12.989Z WARN badger4 v4@v4.2.1-0.20240106094458-1c417aa3799c/db.go:460 Block cache might be too small. Metrics: hit: 2159 miss: 50481648 keys-added: 3044 keys-updated: 129 keys-evicted: 3024 cost-added: 42162384969 cost-evicted: 41894938305 sets-dropped: 2 sets-rejected: 1278 gets-dropped: 16911552 gets-kept: 33534144 gets-total: 50483807 hit-ratio: 0.00
2024-05-30T14:19:12.989Z WARN badger4 v4@v4.2.1-0.20240106094458-1c417aa3799c/db.go:461 Cache life expectancy (in seconds):
-- Histogram:
Min value: 0
Max value: 344
Count: 2516
50p: 2.00
75p: 2.00
90p: 16.00
[0, 2) 2177 86.53% 86.53%
[2, 4) 15 0.60% 87.12%
[4, 8) 8 0.32% 87.44%
[8, 16) 271 10.77% 98.21%
[16, 32) 27 1.07% 99.28%
[32, 64) 13 0.52% 99.80%
[128, 256) 2 0.08% 99.88%
[256, 512) 3 0.12% 100.00%
--
2024-05-30T14:19:16.224Z INFO header/store store/store.go:375 new head {"height": 1938729, "hash": "31749D16D9E6E1306512E2C305911FB29E2B55C821CFFE465CBAB0DAE08D889E"}
2024-05-30T14:19:16.248Z INFO das das/worker.go:161 sampled header {"type": "recent", "height": 1938729, "hash": "31749D16D9E6E1306512E2C305911FB29E2B55C821CFFE465CBAB0DAE08D889E", "square width": 32, "data root": "0660AC93E2805B625B503F091640DCBBC89C0CC624868C51A6C40EF7AE6B56D3", "finished (s)": 0.023504637}
2024-05-30T14:19:22.730Z INFO bitswap-server server/server.go:547 Bitswap Client ReceiveError: Application error 0x0 (remote)
2024-05-30T14:19:28.349Z INFO header/store store/store.go:375 new head {"height": 1938730, "hash": "81B492B99429DB9FD386033D1D1265B60C598495B9BF8521392F7C2D0262E8A3"}
2024-05-30T14:19:28.433Z INFO das das/worker.go:161 sampled header {"type": "recent", "height": 1938730, "hash": "81B492B99429DB9FD386033D1D1265B60C598495B9BF8521392F7C2D0262E8A3", "square width": 64, "data root": "61C3EF7F399008C4D046434F0428227246057F923B4BF88103ACC38791D77E2D", "finished (s)": 0.084622077}
2024-05-30T14:19:30.631Z WARN dagstore dagstore@v0.0.0-20230824094345-537c012aa403/dagstore_async.go:39 acquire: failed to fetch from mount upgrader {"shard": "39F0AB4F339363D96607AC9F3F4A84198982BC392617DB9943EED3C131CBA6A9", "error": "open /home/mzonder/.celestia-full-mocha-4/blocks/39F0AB4F339363D96607AC9F3F4A84198982BC392617DB9943EED3C131CBA6A9: no such file or directory"}
2024-05-30T14:19:30.632Z ERROR share/eds eds/store.go:199 removing shard after failure {"key": "39F0AB4F339363D96607AC9F3F4A84198982BC392617DB9943EED3C131CBA6A9", "err": "failed to acquire reader of mount so we can return the accessor: open /home/mzonder/.celestia-full-mocha-4/blocks/39F0AB4F339363D96607AC9F3F4A84198982BC392617DB9943EED3C131CBA6A9: no such file or directory"}
2024-05-30T14:19:30.632Z ERROR shrex/eds shrexeds/server.go:120 server: get CAR {"peer": "12D3KooWFKs93f3KGuLoAGrGVXNvRvsQgj4M9bmNzkM3EReXLa7w", "hash": "39F0AB4F339363D96607AC9F3F4A84198982BC392617DB9943EED3C131CBA6A9", "err": "failed to get accessor: failed to acquire shard: open /home/mzonder/.celestia-full-mocha-4/blocks/39F0AB4F339363D96607AC9F3F4A84198982BC392617DB9943EED3C131CBA6A9: no such file or directory"}
2024-05-30T14:19:39.999Z INFO header/store store/store.go:375 new head {"height": 1938731, "hash": "EDED983A73058DF8139EBE29B2E9141CF43B341319D9D46AF4D0D422D427765B"}
2024-05-30T14:19:40.188Z INFO das das/worker.go:161 sampled header {"type": "recent", "height": 1938731, "hash": "EDED983A73058DF8139EBE29B2E9141CF43B341319D9D46AF4D0D422D427765B", "square width": 32, "data root": "B52CA4208F5982222F10B745392C608C879266E0D0EE503C6500C519A3278A3B", "finished (s)": 0.188342383}
2024-05-30T14:19:45.503Z WARN dagstore dagstore@v0.0.0-20230824094345-537c012aa403/dagstore_async.go:39 acquire: failed to fetch from mount upgrader {"shard": "50315DB9C299434389571117D20AEE205CA2D8122C174823FCF0B52F9F73E070", "error": "open /home/mzonder/.celestia-full-mocha-4/blocks/50315DB9C299434389571117D20AEE205CA2D8122C174823FCF0B52F9F73E070: no such file or directory"}
2024-05-30T14:19:45.504Z ERROR shrex/eds shrexeds/server.go:120 server: get CAR {"peer": "12D3KooWFKs93f3KGuLoAGrGVXNvRvsQgj4M9bmNzkM3EReXLa7w", "hash": "50315DB9C299434389571117D20AEE205CA2D8122C174823FCF0B52F9F73E070", "err": "failed to get accessor: failed to acquire shard: open /home/mzonder/.celestia-full-mocha-4/blocks/50315DB9C299434389571117D20AEE205CA2D8122C174823FCF0B52F9F73E070: no such file or directory"}
2024-05-30T14:19:45.504Z ERROR share/eds eds/store.go:199 removing shard after failure {"key": "50315DB9C299434389571117D20AEE205CA2D8122C174823FCF0B52F9F73E070", "err": "failed to acquire reader of mount so we can return the accessor: open /home/mzonder/.celestia-full-mocha-4/blocks/50315DB9C299434389571117D20AEE205CA2D8122C174823FCF0B52F9F73E070: no such file or directory"}
2024-05-30T14:19:46.323Z INFO bitswap-server server/server.go:547 Bitswap Client ReceiveError: Application error 0x0 (remote)
2024-05-30T14:19:52.035Z INFO header/store store/store.go:375 new head {"height": 1938732, "hash": "72950186E617A221DE5781A9C4470C85B999021E75ADD41B7AC4EED825612BE7"}
2024-05-30T14:19:52.058Z INFO das das/worker.go:161 sampled header {"type": "recent", "height": 1938732, "hash": "72950186E617A221DE5781A9C4470C85B999021E75ADD41B7AC4EED825612BE7", "square width": 32, "data root": "B74A96F1ECFD741DA66CA805995D8A543D49985A1BEBD2CACB84FC5AEB01D9D1", "finished (s)": 0.023267012}
2024-05-30T14:19:53.232Z INFO bitswap-server server/server.go:547 Bitswap Client ReceiveError: Application error 0x0 (remote)
2024-05-30T14:19:56.267Z INFO bitswap-server server/server.go:547 Bitswap Client ReceiveError: Application error 0x0 (remote)
2024-05-30T14:19:57.976Z INFO bitswap-server server/server.go:547 Bitswap Client ReceiveError: Application error 0x0 (remote)
2024-05-30T14:20:03.743Z INFO header/store store/store.go:375 new head {"height": 1938733, "hash": "558BDEF3F578589BA1E234B390D895839E8241B83DDA4BB9905BEA43B60BCB9A"
The funny thing is that after applying the snapshot, you will run the node with the store on a different path. Then try to move the store to the same one used in the snapshot and you will have the same errors for all those blocks you already synced with the new store.
Each block saves its location that can not be overwritten by the --node.store
flag.
restored snapshot to the identical path (repeated all the steps above except changing the path) - works fine
@Liver-23 Does yours default to /root/.celestia-bridge-mocha-4/...
too?
Yes, I used this snapshot https://kingsuper.org/snapshots/celestia-bridge-testnet Curious if @maxzonder used the same or any other that has the same dir as a default, like data saves the path and despite you want to use a different it goes where it was initially stored on the original.
i used my own so mine stuck at /home/mzonder/..
Yup, tried to sync my own, then move the store to the different location and reconfigure to it, and then started to see those message when the node is trying to find previously synced blocks in old location
@Liver-23 @mindstyle85 @maxzonder is this new behavior since 0.13.6 or something that maybe existed for longer and is just discovered now?
i havent seen reports of this before but i am not sure since we never used the snapshot or different directory ourselves
I joined with 0.13.6 already, therefore, I can't be sure.
thanks for info @Liver-23 and @mindstyle85, i'm going to start looking into this now
i think i had the same issue on 0.13.1
ok a few things here:
1) i've confirmed that syncing a bridge node, and then moving the path of the store does mean the new --node.store
path is seemingly "ignored". Excellent find and thanks for reporting. No one had run into this before 😸
2) this does not appear to be caused by improper handling of the node.store
param, but a side effect of one of the EDS datastores indexing absolute paths (appears to be dagstore index) https://github.com/celestiaorg/celestia-node/blob/main/share/eds/store.go#L103
3) i think we could and normally would seek to fix it, BUT as it is not seemingly a quick and simple fix, and as we have a significant storage rewrite close to shipping that will remove this behavior entirely in the next two months (which @walldiss can confirm) i propose that we, for now, close without fixing and ensure that testing moving store location on filesystem or recovering from snapshot work properly in our new storage engine.
As a temporary interim solution, it would seem that a symlink between new path from the original assumed path from the snapshot would solve this issue acceptably for now?
Interesting possible workaround with symlink actually, curious if it will work, should, I guess. If so, that will simplify things for those who need to restore from the snapshot fast until the final fix.
Hi all, using a symlink is working but if you use a dedicated non-root user you might run into permission issues:
# create symlink
sudo ln -s /root/.celestia-bridge </path/to/your/.celestia-bridge>
# ensure the symlink /root/.celestia-bridge itself is owned by the celestia user
sudo chown -h celestia:celestia /root/.celestia-bridge
celestia-bridge
run by user celestia
fails to access the db:
celestia[173708]: 2024-06-15T07:02:54.607Z ERROR share/eds eds/store.go:199 removing shard after failure {"key": "F2150A832142748239CC18E51F9ADFF93CC5C30F5809A68BEC6A812315EEDBFC", "err": "failed to acquire reader of mount so we can return the accessor: open /root/.celestia-bridge/blocks/F2150A832142
748239CC18E51F9ADFF93CC5C30F5809A68BEC6A812315EEDBFC: permission denied"}
An unsafe fix for this would be to hand executable permissions on the whole root folder.
A better option is to use ACLs
to grant the celestia
user access to the /root
directory specifically for the symlink without changing the global permissions of /root
:
# Set ACL for the celestia user
sudo setfacl -m u:celestia:x /root
# Verify the ACL
getfacl /root
# file: root
# owner: root
# group: root
user::rwx
user:celestia:--x
group::--x
mask::--x
other::---
Restart celestia-bridge
and observe logs:
celestia[174507]: 2024-06-15T07:25:48.146Z INFO header/store store/store.go:375 new head {"height": 1675259, "hash": "2AA94B4658DE2658ECCC102AB3A8C2D172874AA5AA4E78
DF379FBE94B201FD83"}
excellent! Yes I meant to mention it did seem in addition to the path issue there was potentially a user perm issue, great explanation of you solution
I'll ensure this snapshot issue does not persist (😂) with our new storage engine in coming months
excellent! Yes I meant to mention it did seem in addition to the path issue there was potentially a user perm issue, great explanation of you solution
I'll ensure this snapshot issue does not persist (😂) with our new storage engine in coming months
Hey hey After several months, do we have a fix for this bug? or is there still no fix except for symbolic links?
hi @lesnikutsa the new storage engine that resolves this is not yet released, you can follow progress here https://github.com/celestiaorg/celestia-node/pull/3675
hi @lesnikutsa the new storage engine that resolves this is not yet released, you can follow progress here #3675
I am testing bridge snapshots in different directories but one (root) user. I downloaded a snapshot created from the / directory (root user). I unzipped the snapshot to the /bigdata directory (root user) and I don't see any global ERRORS (after syncing)
So far the only errors I saw were these at the moment of synchronization (while my node was catching up with the network)
I don't see any more errors and after successful synchronization my logs look like this
maybe the problem is in different users...
So - after another reboot of Bridge I see only errors. It is strange that after the first launch there were no errors
Celestia Node version
v0.13.6
OS
ubuntu
Install tools
No response
Others
No response
Steps to reproduce it
from figment:
/home/celestia/mocha-4/bin/celestia-node-v0.13.6 bridge start --p2p.network mocha --node.store /home/celestia/mocha-4-bridge --node.config /home/celestia/mocha-4-bridge/config.toml --metrics --metrics.tls=false
but the logs show:
ERROR shrex/nd shrexnd/server.go:128 handling request {"source": "server", "peer": "12D3KooWJzT9VwXa2hKvrujBKgnTcdVfohfBxQitcRW5eJCEANeD", "namespace": "00000000000000000000000000000000000000a66006f4805643fb8f24", "hash": "81E8AF5CFC8931C50EEF6D23D1E922A50F187720B7448C713A656336B14E618C", "err": "retrieving DAH: eds/store: failed to get CAR file: failed to get accessor: failed to acquire shard: shard is in errored state; err: failed to acquire reader of mount so we can return the accessor: open /root/.celestia-bridge-mocha-4/blocks/81E8AF5CFC8931C50EEF6D23D1E922A50F187720B7448C713A656336B14E618C: permission denied"}
Expected result
using correct node.store path
Actual result
n/a
Relevant log output
No response
Notes
No response