MinaProtocol / mina

Mina is a cryptocurrency protocol with a constant size blockchain, improving scaling while maintaining decentralization and security.
https://minaprotocol.com
Apache License 2.0
1.99k stars 529 forks source link

mina crash report. UTC [Error] validation timed out :( #6767

Closed Gregg-First closed 3 years ago

Gregg-First commented 3 years ago

Testnet 4.1 Image: Ubuntu 20.04 (LTS) x64 Size: 8 vCPUs - 16GB / 320GB Disk

I tried several times to run a block producer but always had a crash. My final attempt was with docker step-by-step:

I copied "keys" folder to server

chmod 700 $HOME/keys chmod 600 $HOME/keys/my-wallet

echo 'export KEYPATH=$HOME/keys/my-wallet' >> $HOME/.profile echo 'export MINA_PUBLIC_KEY=$(cat $HOME/keys/my-wallet.pub)' >> $HOME/.profile echo 'export CODA_PRIVKEY_PASS=My_password' >> $HOME/.profile source ~/.profile

wget -O ~/peers.txt {URL from letter}

sudo apt update && sudo apt upgrade -y

sudo apt install docker.io curl -y

sudo systemctl start docker sudo systemctl enable docker

sudo iptables -A INPUT -p tcp --dport 8302 -j ACCEPT sudo iptables -A INPUT -p tcp --dport 8303 -j ACCEPT

sudo docker run --name mina -d \ -p 8301-8305:8301-8305 \ -p 127.0.0.1:3085:3085 \ --mount type=bind,source="$(pwd)/peers.txt,dst=/root/peers.txt",readonly \ -v $(pwd)/keys:/root/keys:ro \ -v $(pwd)/.coda-config:/root/.coda-config \ --restart always \ minaprotocol/mina-daemon-baked:4.1-turbo-pickles-mina757342b-auto811bf26 daemon \ -peer-list-file $HOME/peers.txt \ -block-producer-key $KEYPATH \ -block-producer-password $CODA_PRIVKEY_PASS \ -insecure-rest-server \ -log-level Info \ -work-selection seq

I always receive the same mistake "UTC [Error] validation timed out :(" And then Every 2.0s: docker exec mina coda client status mina-test: Thu Nov 19 15:21:41 2020

Error: Unable to connect to Coda daemon.

All logs attached.

bkase commented 3 years ago

@Gregg-First does your node crash? Or is it running properly? This could be caused by the validation taking CPU away from the other work that your node is doing. It's slightly annoying, but not "broken" per se. If you're node is actually dying when this happens then it's a more important issue.

Also, can you make sure to make your attached logs publicly readable?

Kirol54 commented 3 years ago

Similar issue here Ubuntu 18.04 t3a.2xlarge 8 vCPU, 32GB RAM

The node can see correct block length after 15 min but it stays on block height 1 all the time. Connecting to many peers (50-100) but crashing approximately every 25 minutes. Sync status first 15 min mostly 'Synced' then when errors starting to appear(first 36 lines from logs) status changes to 'Catchup' and crashing ~5 min later Transactions are getting broadcasted(can see them on block explorer with the balance of my account going down) but it takes quite a bit of time for the command to execute(sometimes too long as the node crashes, but node crashes also if there are no txs submitted). RAM usage is not exceeded 6GB (It's crashing every 30 min so ram usage goes back down) CPU usage 70-90% on all 8.

Logs: https://gist.github.com/Kirol54/776f20fe2b35531a3a609cf8f29c5e8f

whaleba commented 3 years ago

Similar issue here Ubuntu 18.04 e2-standard-8 (cloud.google) 8 vCPU, 32GB RAM

Status shows the height of the network slightly less than minaexplorer. Block height 1. Peers 40 - 70 and more Crash - 20-50 minutes after the start Not launch snark

there are a lot of errors during operation: [Error] validation for item 1187 took 1088556812565 seconds [Error] validation for item 1265 took 1015213578566 seconds [Error] validation timed out :(

Ends with lines: Daemon child process $ child_pid terminated with exit code 0", "metadata": {"child_pid": 7880, Child process of kind $ process_kind with pid $ child_pid has terminated "," metadata ": {" child_pid ": 7880,

launching a node without docker, without auto-restart

Full logs https://gist.github.com/whaleba/0c18d281610e89fec94c6a0094ca88b6

Gregg-First commented 3 years ago

@Gregg-First does your node crash? Or is it running properly? This could be caused by the validation taking CPU away from the other work that your node is doing. It's slightly annoying, but not "broken" per se. If you're node is actually dying when this happens then it's a more important issue.

Also, can you make sure to make your attached logs publicly readable?

I made it readable logs

gnosed commented 3 years ago

Same issue here

Ubuntu 18.04.5 LTS 8 vCPU, 32 GB RAM, 240 GB SSD

Coda daemon status
-----------------------------------

Global number of accounts:       1026
Block height:                    2
Max observed block length:       2
Local uptime:                    8m25s
Ledger Merkle root:              jxpPYhcXCXo7T2JmyqcCor5x65zzqUpd1oTbspVwwgiRd7ftHof
Protocol state hash:             3NK3fQEHBguGcw6exX3KQrpGhjz2stviy9dJWxxeZq662HY5nmtH
Chain id:                        ef95774fb2ed657da61bfc40a8a148a9c9a202476b539aaf915e6cd81d1ef268
Git SHA-1:                       [DIRTY]757342b6cef510f13cb5ad2fbad97518d6df45df
Configuration directory:         /root/.coda-config
Peers:                           100 (34.75.48.118:10514 161.97.84.68:8302 135.181.77.184:8302 35.238.79.1:8302 140.82.10.147:8302 35.197.55.249:8302 213.136.68.86:8302 168.119.64.26:8302 168.119.245.253:8302 168.119.243.11:8302 36.189.234.172:8302 95.216.164.21:8302 95.217.130.92:8302 178.140.28.164:8302 167.172.150.166:8302 78.47.27.198:8302 168.119.247.89:8302 35.237.228.219:10508 159.69.251.145:8302 34.75.23.163:10509 167.86.104.204:8302 95.217.157.114:8302 34.74.76.116:10502 34.74.107.15:10511 164.68.103.231:8302 95.217.2.47:8302 95.216.223.106:8302 35.232.20.211:8302 35.198.111.197:8302 5.189.140.233:8302 195.2.70.184:8302 47.253.14.41:8302 148.251.236.112:8302 178.128.6.216:8302 185.186.142.120:8302 69.164.197.183:8302 206.189.213.175:8302 47.253.15.7:8302 44.234.88.25:8302 34.75.103.6:10401 34.89.147.186:8302 95.179.138.74:8302 136.243.101.239:8302 62.113.119.110:8302 167.86.111.224:8302 35.237.37.32:10909 35.231.227.169:10504 35.237.137.82:10515 18.222.186.77:8302 213.136.74.60:8302 34.70.175.206:8302 44.234.150.164:8302 62.171.185.75:8302 168.119.183.100:60844 5.189.170.159:8302 34.75.61.180:10505 34.75.30.199:10516 108.61.217.20:8302 34.222.179.83:8302 95.216.37.122:8302 35.237.148.187:10507 78.47.27.79:8302 107.152.46.145:8302 35.225.94.177:8302 51.89.235.95:8302 138.197.144.138:8302 35.227.13.96:10001 35.231.245.46:10510 3.137.170.80:8302 173.249.2.28:8302 44.226.199.154:8302 95.217.229.70:8302 104.198.177.218:8302 34.73.234.206:10001 139.162.136.96:8302 161.97.82.47:8302 168.119.177.238:8302 49.12.70.4:8302 188.26.201.137:8302 34.73.206.59:10503 178.128.8.45:8302 142.93.180.227:8302 35.227.120.14:10512 135.181.5.152:8302 161.97.82.94:8302 104.154.80.67:8302 34.83.138.79:8302 95.216.118.34:8302 168.119.179.181:8302 161.97.82.95:8302 35.243.211.158:10001 34.73.254.106:10506 68.183.188.27:8302 95.111.254.125:8302 35.178.173.65:8302 47.253.4.27:8302 152.32.172.170:8302 161.97.82.57:8302 83.99.87.101:8302 198.84.232.42:8302)
User_commands sent:              0
SNARK worker:                    None
SNARK work fee:                  25000000
Sync status:                     Synced

Starts "catchup" after a while. However gets stuck at block height 2

Coda daemon status
-----------------------------------

Global number of accounts:       1026
Block height:                    2
Max observed block length:       312
Local uptime:                    14m34s
Ledger Merkle root:              jxpPYhcXCXo7T2JmyqcCor5x65zzqUpd1oTbspVwwgiRd7ftHof
Protocol state hash:             3NK3fQEHBguGcw6exX3KQrpGhjz2stviy9dJWxxeZq662HY5nmtH
Chain id:                        ef95774fb2ed657da61bfc40a8a148a9c9a202476b539aaf915e6cd81d1ef268
Git SHA-1:                       [DIRTY]757342b6cef510f13cb5ad2fbad97518d6df45df
Configuration directory:         /root/.coda-config
Peers:                           102 (35.237.228.219:10508 159.69.251.145:8302 34.75.23.163:10509 167.86.104.204:8302 95.217.157.114:8302 34.74.76.116:10502 34.74.107.15:10511 164.68.103.231:8302 95.217.2.47:8302 35.198.111.197:8302 5.189.140.233:8302 195.2.70.184:8302 47.253.14.41:8302 148.251.236.112:8302 178.128.6.216:8302 95.216.223.106:8302 35.232.20.211:8302 185.186.142.120:8302 69.164.197.183:8302 206.189.213.175:8302 47.253.15.7:8302 44.234.88.25:8302 34.75.103.6:10401 34.89.147.186:8302 95.179.138.74:8302 136.243.101.239:8302 35.237.37.32:10909 35.231.227.169:10504 35.237.137.82:10515 18.222.186.77:8302 213.136.74.60:8302 34.70.175.206:8302 62.113.119.110:8302 167.86.111.224:8302 44.234.150.164:8302 62.171.185.75:8302 168.119.183.100:60844 5.189.170.159:8302 34.75.61.180:10505 34.75.30.199:10516 108.61.217.20:8302 34.222.179.83:8302 95.216.37.122:8302 35.237.148.187:10507 78.47.27.79:8302 107.152.46.145:8302 35.225.94.177:8302 51.89.235.95:8302 35.227.13.96:10001 35.231.245.46:10510 3.137.170.80:8302 173.249.2.28:8302 44.226.199.154:8302 95.217.229.70:8302 138.197.144.138:8302 34.73.234.206:10001 139.162.136.96:8302 161.97.82.47:8302 168.119.177.238:8302 49.12.70.4:8302 188.26.201.137:8302 104.198.177.218:8302 44.242.144.119:8302 34.73.206.59:10503 178.128.8.45:8302 161.97.82.94:8302 35.227.120.14:10512 135.181.5.152:8302 104.154.80.67:8302 34.83.138.79:8302 95.216.118.34:8302 142.93.180.227:8302 35.243.211.158:10001 34.73.254.106:10506 68.183.188.27:8302 95.111.254.125:8302 35.178.173.65:8302 47.253.4.27:8302 168.119.179.181:8302 161.97.82.95:8302 152.32.172.170:8302 161.97.82.57:8302 198.84.232.42:8302 44.235.91.67:8302 34.75.48.118:10514 161.97.84.68:8302 135.181.77.184:8302 35.238.79.1:8302 140.82.10.147:8302 35.197.55.249:8302 213.136.68.86:8302 168.119.64.26:8302 36.189.234.170:8302 36.189.234.172:8302 95.216.164.21:8302 95.217.130.92:8302 178.140.28.164:8302 167.172.150.166:8302 78.47.27.198:8302 168.119.245.253:8302 168.119.243.11:8302 168.119.247.89:8302)
User_commands sent:              0
SNARK worker:                    None
SNARK work fee:                  25000000
Sync status:                     Catchup

Node crashed after ~65min

2020-11-20 10:47:44 UTC [Error] validation for item 3888 took 521485723715 seconds
2020-11-20 10:47:44 UTC [Error] validation for item 3771 took 665202839349 seconds
2020-11-20 10:47:44 UTC [Error] validation for item 3582 took 823169621624 seconds
2020-11-20 10:47:44 UTC [Error] validation for item 3542 took 850850765085 seconds
2020-11-20 10:47:44 UTC [Error] validation for item 3802 took 635686225807 seconds
2020-11-20 10:47:44 UTC [Error] validation for item 3721 took 702993659717 seconds
2020-11-20 10:47:44 UTC [Error] validation for item 3951 took 466041590443 seconds
2020-11-20 10:47:44 UTC [Error] validation for item 3827 took 615752619030 seconds
2020-11-20 10:47:45 UTC [Error] validation for item 3470 took 911120953813 seconds
2020-11-20 10:47:45 UTC [Error] validation for item 3952 took 467186576087 seconds
2020-11-20 10:47:45 UTC [Error] validation timed out :(
2020-11-20 10:47:45 UTC [Error] validation timed out :(
2020-11-20 10:47:47 UTC [Error] validation timed out :(
2020-11-20 10:47:49 UTC [Error] validation timed out :(
2020-11-20 10:47:49 UTC [Error] validation timed out :(
2020-11-20 10:47:54 UTC [Error] validation for item 3615 took 783701981850 seconds
2020-11-20 10:47:54 UTC [Error] validation for item 3739 took 713408556535 seconds
2020-11-20 10:47:54 UTC [Error] validation for item 3922 took 499345428851 seconds
2020-11-20 10:47:54 UTC [Error] validation for item 3485 took 920398803612 seconds
2020-11-20 10:47:54 UTC [Error] validation for item 4102 took 331855613063 seconds
2020-11-20 10:47:54 UTC [Error] validation for item 3569 took 844593268830 seconds
2020-11-20 10:47:55 UTC [Error] validation timed out :(
2020-11-20 10:47:56 UTC [Error] validation for item 3881 took 539513030473 seconds
2020-11-20 10:47:56 UTC [Error] validation for item 3896 took 524570708084 seconds
2020-11-20 10:47:56 UTC [Error] validation for item 3926 took 479058492644 seconds
2020-11-20 10:47:56 UTC [Error] validation for item 3893 took 527187311830 seconds
2020-11-20 10:47:56 UTC [Error] validation for item 3636 took 756100749927 seconds
2020-11-20 10:47:58 UTC [Error] validation timed out :(
2020-11-20 10:47:58 UTC [Error] validation timed out :(
2020-11-20 10:48:00 UTC [Error] validation timed out :(
2020-11-20 10:48:01 UTC [Error] validation timed out :(
2020-11-20 10:49:10 UTC [Info] Daemon child process 31896 terminated with exit code 0
2020-11-20 10:49:10 UTC [Error] Child process of kind "Verifier" with pid 31896 has terminated

And didn't output a crash report in .coda-config

jimcase commented 3 years ago

Similar issue here

Block height 2. Peers 40 - 70 and more Crash - 20-50 minutes after the start Not launch snark

Ubuntu 20.04.1 LTS 8 vCPU, 32 GB RAM, 512 GB SSD

Full logs: https://github.com/jimcase/minatest/blob/main/coda_crash_report_2020-11-20_05-30-03.305634.tar.gz?raw=true

Gregg-First commented 3 years ago

Issue solved after I bought dedicated server. @bkase AMD Ryzen 5 3600 Hexa-Core RAM: 64 GB DDR4

It seems that even the "CPU optimized" promised by the provider is not entirely fair.