Closed NatPDeveloper closed 4 years ago
is your nodeos getting blocks ?
cleos -u http://localhost:9888 get info
<-- is it advancing ?
(you can also use curl localhost:9888/v1/chain/get_info
if cleos not available)
does it have connected peers ?
cleos -u http://localhost:9888 net peers
If the nodeos endpoint does not listen yet, it could be just slow performance, are you using a SSD for storage ? do you have decent perf settings in your config.ini ?
eos-vm-oc-enable = true
eos-vm-oc-compile-threads = 4
If the blocks are advancing but mindreader is not producing anything, have you certain that the config.ini contains the deepmind flags ?
deep-mind = true
contracts-console = true
cleos -u http://localhost:9888 get info Failed to connect to nodeos at http://localhost:9888; is nodeos running?
cleos -u http://localhost:9888 net peers Failed to connect to nodeos at http://localhost:9888; is nodeos running?
df -h --total
...
total 1.6T 396G 1.2T 25% -
mindreader-no-blocks-log: true
to rm blocks log for snapshotconfig.ini
added to OPSame results.
Could you try grepping the nodeos command (ps aux |grep nodeos) and then running it directly, without dfuse ?
ex: nodeos -d /data -c /your-config --snapshot=...
You will see the logs directly from nodeos, it could be faster to determine what's going on.
I would expect it to take more than a few minutes to load the state from snapshot, but then you would know directly from nodeos and see if any error pop up
( try it with deep-mind=false and contracts-console=false first )
deep-mind: false
and contracts-console: false
. Waited 10m then killednodeos --config-dir=./mindreader --data-dir=/root/workspace/dfuse-data/mindreader/data --snapshot=/root/workspace/dfuse-data/mindreader/data/snapshots/0146083494-08b50ea69b029e54fa4fc03299e809a7a94184d1ae3fd7b587434b4f8f633cf8-snapshot.bin --pause-on-startup
Usually it says something like "loading from snapshot this may take a long time" but not seeing that here.
Maybe it takes more than 10 minutes for those big snapshots. Do you have logging.json in your config dir that may prevent some logs from appearing ?
I'm logging from the command itself. I left it for hours, same thing.
Attempting with c5.18xlarge instance for an hour, will report back.
Just a bunch of
2020-10-09T18:30:24.672Z (mindreader) operator ready to receive commands (operator/operator.go:135)
2020-10-09T18:45:24.672Z (mindreader) received operator command (operator/operator.go:235) {"command": "snapshot", "params": null}
2020-10-09T18:45:24.672Z (mindreader) preparing for snapshot (operator/operator.go:341)
2020-10-09T18:45:24.672Z (mindreader) asking nodeos API to create a snapshot (superviser/snapshot.go:33)
2020-10-09T18:45:24.672Z (mindreader) command failed (operator/operator.go:518) {"cmd": "snapshot", "error": "unable to take snapshot: api call failed: http://:9888/v1/producer/create_snapshot: Post \"http://:9888/v1/producer/create_snapshot\": dial tcp :9888: connect: connection refused"}
2020-10-09T18:45:24.672Z (mindreader) operator ready to receive commands (operator/operator.go:135)
I need the snapshots for this instance in case the node crashes.
this is just logs because the operator is trying to trigger snapshots but the node is not ready yet. they don't say anything...
The real issue is: "with a given snapshot, nodeos never becomes ready, even without deep-mind enabled".
Since the command nodeos --config-dir=./mindreader --data-dir=/root/workspace/dfuse-data/mindreader/data --snapshot=/root/workspace/dfuse-data/mindreader/data/snapshots/0146083494-08b50ea69b029e54fa4fc03299e809a7a94184d1ae3fd7b587434b4f8f633cf8-snapshot.bin --pause-on-startup
fails to get you a running nodeos, then the issue is in nodeos or in your snapshot file, not in dfuse code.
Does it work with a different snapshot ? Does it work with a different nodeos version ? How do you expect dfuse to react when the nodeos instance never becomes ready ?
Yeah going to go get another snapshot, start from scratch if that's a no go. If the issue repeats will re-open.
New snap looks good. Previous snap came from mindreader. Maybe an issue with how it was created or the sequence of events.
Brief:
Running partial sync, push guarantee node. Node is not passing below logs. Starting from snapshot. Previously when restarting regularly it wanted a snapshot or a blocks.log from genesis. I thought I could just restart it, wouldn't do a clean shutdown after 10m of waiting, perhaps too impatient.
Running in screen on z1d.12xlarge ec2 instance
CPU pinned
version
yaml
logs
config.ini