erigontech / erigon

Ethereum implementation on the efficiency frontier https://erigon.gitbook.io
GNU Lesser General Public License v3.0
3.1k stars 1.09k forks source link

mdbx_env_open: no such file or directory when trying to run rpcdaemon locally #2468

Closed TimDaub closed 3 years ago

TimDaub commented 3 years ago

System information

Erigon version: ./erigon --version

./erigon/build/bin/erigon --version
erigon version 2021.07.5-alpha

How I run erigon:

cat node.sh
#!/bin/bash

nohup bash -c "./erigon/build/bin/erigon --private.api.addr=localhost:9080 --metrics --metrics.addr=localhost --metrics.port=6060" &> node.out &

OS & Version: Windows/Linux/OSX

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.2 LTS
Release:        20.04
Codename:       focal

Commit hash : 054581f50768d781623a0f51cf1eac8a6f0e04ae

Expected behaviour

I'm trying to run the rpcdaemon locally as outlined in the docs:

cat rpc.sh
#!/bin/bash

nohup ./erigon/build/bin/rpcdaemon \
        --datadir=/home/root/.local/share/erigon \
        --private.api.addr=127.0.0.1:9080 \
        --http.api=eth,erigon,web3,net,debug,trace,txpool,shh &> rpc.out &

and

ls /home/root/.local/share/erigon
chaindata  erigon

Actual behaviour

However, each time I pass the --datadir option, I get the following error:

tail -f rpc.out
ERROR[07-29|14:32:31.331] Could not connect to DB                  error="mdbx_env_open: no such file or directory, label: chaindata, t
race: [github.com/ledgerwatch/erigon/ethdb/kv.MdbxOpts.Open github.com/ledgerwatch/erigon/cmd/rpcdaemon/cli.RemoteServices main.main.fu
nc1 github.com/spf13/cobra.(*Command).execute github.com/spf13/cobra.(*Command).ExecuteC github.com/spf13/cobra.(*Command).Execute gith
ub.com/spf13/cobra.(*Command).ExecuteContext main.main runtime.main runtime.goexit]"

Steps to reproduce the behaviour

AskAlexSharov commented 3 years ago

Do you run Erigon as root user? do you see files in chaindata directory?

TimDaub commented 3 years ago

Do you run Erigon as root user?

ps aux | grep erigon
root      114094  162 88.5 2183133164 116764800 ? Sl  Jul26 8684:26 ./erigon/build/bin/erigon --private.api.addr=localhost:9080 --metrics --metrics.addr=localhost --metrics.port=6060

I guess that means yes. Shouldn't I?

do you see files in chaindata directory?

ls -alh .local/share/erigon/erigon/chaindata/
total 1.3T
drwxr--r-- 2 root root  4.0K Jul 26 16:10 .
drwxr-xr-x 3 root root  4.0K Jul 30 09:02 ..
-rw-r--r-- 1 root root  1.3T Jul 30 09:51 mdbx.dat
-rw-r--r-- 1 root root 1004K Jul 30 09:51 mdbx.lck
AskAlexSharov commented 3 years ago

Shouldn't I? - no real reason, but up to you.

  1. Can you also set --datadir for Erigon?
  2. No Docker?
  3. XDG_DATA_HOME env variable not set?
  4. I created branch rpc_daemon_print_path which must print path which using RPCDaemon, can you try this branch? (branch from v 2021.07.05)
TimDaub commented 3 years ago
1. Can you also set --datadir for Erigon?

updated node.sh:

cat node.sh
#!/bin/bash

nohup bash -c "./erigon/build/bin/erigon --datadir=/home/root/.local/share/erigon --private.api.addr=localhost:9080 --metrics --metrics.addr=localhost --metrics.port=6060" &> node.out &

when I now run rpc.sh

./rpc.sh
tail -f rpc.out
INFO [07-30|15:49:36.412] DB schemas compatible                    reader=3.0.0 database=3.0.0
INFO [07-30|15:49:36.412] rpc filters: subscribing to Erigon events
INFO [07-30|15:49:36.413] HTTP endpoint opened                     url=localhost:8545 ws=false ws.compression=false
INFO [07-30|15:49:36.413] interfaces compatible                    remote_db=127.0.0.1:9080 client=3.0.0 server=3.0.0
INFO [07-30|15:49:36.413] interfaces compatible                    remote_service=eth_backend client=2.1.0 server=2.1.0
INFO [07-30|15:49:36.413] interfaces compatible                    remote_service=mining      client=1.0.0 server=1.0.0
INFO [07-30|15:49:36.414] interfaces compatible                    remote_service=tx_pool     client=1.0.0 server=1.0.0
^C

it seems to resolve the issue.

2\. No Docker?

No, but I actually appreciate the reproducibility of Docker but I can't wrap my head around it all. Do you recommend using it? It's off-topic, but e.g. for running grafana/prometheus, I didn't see any advice/docs of running them outside of docker.

3\. XDG_DATA_HOME env variable not set?

What is that and should I set it?

4\. I created branch rpc_daemon_print_path which must print path which using RPCDaemon, can you try this branch? (branch from v 2021.07.05)

I'm happy to still try this, but since the problem seems to be resolved - do you still want me to do it?

Thank you

AskAlexSharov commented 3 years ago

« Docker but I can't wrap my head around it all. Do you recommend using it? » - no, just trying to understand your problem.

“ problem seems to be resolved - do you still want me to do it?” - for now i don’t understand why it resolved. Probably your Erigon now pointing to another folder. But where it were pointed before?

AskAlexSharov commented 3 years ago

Erigo do support this spec: https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html

to see env variable can run: echo $XDG_DATA_HOME

TimDaub commented 3 years ago

for now i don’t understand why it resolved. Probably your Erigon now pointing to another folder. But where it were pointed before?

You seem to be right. It started synching again from a very old block when I defined --datadir=/home/root/.local/share/erigon. From my understanding, when not specifying --datadir as an option, it takes /home/root/.local/share/erigon, no?

TimDaub commented 3 years ago

to see env variable can run: echo $XDG_DATA_HOME

my system's answer

echo $XDG_DATA_HOME

(the variable isn't set)

is there a difference between --datadir and XDG_DATA_HOME or should they point to the same path?

AskAlexSharov commented 3 years ago

no. --datadir is more important than XDG_DATA_HOME, and XDG_DATA_HOME is more important than default probably you used default. then I don't understand why you have initial problem.

TimDaub commented 3 years ago

By the way, it seems that my experimenting has somehow destroyed my sync progress. Here's what I've done:

  1. Initially I launched the node without a --datadir option as outlined in the original post
  2. Then upon response in this thread, I launched the node once with --datadir=/home/root/.local/share/erigon but it seemed it didn't recognize the directory. It started syncing from scratch.
  3. Finally, I set it to --datadir=/home/root/.local/share/erigon/erigon for trying. I noticed that here too it started syncing from scratch
  4. So I removed the --datadir option and had it ran for a while now. It started with verifying some call trace data, so I thought that this may just be standard when restarting a node. But now after almost a day late, I think its still in sync mode
curl -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' localhost:8545
{"jsonrpc":"2.0","id":1,"result":{"currentBlock":"0x0","highestBlock":"0x105000","stages":[{"stage_name":"Headers","block_number":"0x105000"},{"stage_name":"BlockHashes","block_number":"0x0"},{"stage_name":"Bodies","block_number":"0x0"},{"stage_name":"Senders","block_number":"0x0"},{"stage_name":"Execution","block_number":"0x0"},{"stage_name":"Translation","block_number":"0x0"},{"stage_name":"HashState","block_number":"0x0"},{"stage_name":"IntermediateHashes","block_number":"0x0"},{"stage_name":"AccountHistoryIndex","block_number":"0x0"},{"stage_name":"StorageHistoryIndex","block_number":"0x0"},{"stage_name":"LogIndex","block_number":"0x0"},{"stage_name":"CallTraces","block_number":"0x0"},{"stage_name":"TxLookup","block_number":"0x0"},{"stage_name":"TxPool","block_number":"0x0"},{"stage_name":"Finish","block_number":"0x0"}]}}

logs

tail -f node.out
INFO [07-31|11:02:55.718] [1/18 Headers] Waiting for headers...    from=12932408
INFO [07-31|11:03:00.936] [1/18 Headers] Processed                 highest inserted=12932409 age=5s
INFO [07-31|11:03:00.939] [4/18 Bodies] Processed                  highest=12932409
INFO [07-31|11:03:00.996] [7/18 Execution] Completed on            block=12932409
INFO [07-31|11:03:01.040] [11/18 IntermediateHashes] Trie root      hash=0x8a64b0cc85292fac23a98cb2e27ac3860f65c5e8f51eb19ec1e3a86ce7d51a24
INFO [07-31|11:03:01.114] [17/18 TxPool] Transaction stats         pending=3928 queued=1030
INFO [07-31|11:03:01.203] Commit cycle                             in=88.648427ms
INFO [07-31|11:03:01.203] Update current block for the RPC API     from=12932409 to=12932409
INFO [07-31|11:03:01.203] [1/18 Headers] Waiting for headers...    from=12932409
INFO [07-31|11:03:03.796] [p2p] GoodPeers                          eth66=21 eth65=31

Or what is going on? The disk seems to be full so I suspect the data to be there

df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             63G     0   63G   0% /dev
tmpfs            13G  1.1M   13G   1% /run
/dev/md2        2.0T  1.4T  539G  72% /
tmpfs            63G     0   63G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            63G     0   63G   0% /sys/fs/cgroup
/dev/md1        487M   85M  378M  19% /boot
/dev/md3        1.5T  2.5G  1.5T   1% /home
tmpfs            13G     0   13G   0% /run/user/0
TimDaub commented 3 years ago

Thanks to a quick session in Discord, @AskAlexSharov found the problem:

I always had confused /root/.local/share/erigon with /home/root/.local/share/erigon. This was as I assumed that the root user's directory was also nested in /home. For my Ubuntu machine, however, it isn't. The erigon data dir for my case was at /root/.local/share/erigon.

So here's how we ended up fixing the problem:

tail -f rpc.out
INFO [07-31|11:39:10.280] DB schemas compatible                    reader=3.0.0 database=3.0.0
INFO [07-31|11:39:10.280] rpc filters: subscribing to Erigon events
INFO [07-31|11:39:10.281] HTTP endpoint opened                     url=localhost:8545 ws=false ws.compression=false
INFO [07-31|11:39:10.282] interfaces compatible                    remote_db=127.0.0.1:9080 client=3.0.0 server=3.0.0
INFO [07-31|11:39:10.282] interfaces compatible                    remote_service=eth_backend client=2.1.0 server=2.1.0
INFO [07-31|11:39:10.282] interfaces compatible                    remote_service=mining      client=1.0.0 server=1.0.0
INFO [07-31|11:39:10.282] interfaces compatible                    remote_service=tx_pool     client=1.0.0 server=1.0.0
curl -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' localhost:8545
{"jsonrpc":"2.0","id":1,"result":false}

So now everything is back to normal and it's working. Thanks for your support!

changwilling commented 3 years ago

I also make a mistake when i run './build/bin/rpcdaemon --datadir=/data1/erigon' . The '-datadir' should be the same as when you run './build/bin/erigon --datadir=/data1/erigon' .