MinaProtocol / mina

Mina is a cryptocurrency protocol with a constant size blockchain, improving scaling while maintaining decentralization and security.
https://minaprotocol.com
Apache License 2.0
1.99k stars 529 forks source link

OOM but not OOM (Attempted to allocate N bytes) #7020

Closed c29r3 closed 2 years ago

c29r3 commented 3 years ago

Description

My node has crashed, but I don't see the crash report in the .coda-config folder :thinking: In logs I see OOM, but according to Grafana the maximum was 17/64 GB image image image image

Environment

Git SHA-1: [DIRTY]880882e45e48aaf987aca191f488c7f65609c305
Host OS: Ubuntu 18.04  
Using containerization: docker  
CPU: Intel(R) Xeon(R) CPU E3-1275 v6 @ 3.80GHz
RAM: 64 GB  

Steps to reproduce

  1. Standard installation from the repository, according to the instructions from #qa-task-force pin message
  2. Run new docker instance
    docker run --name mina -d \
    --restart always \
    -p 8301-8305:8301-8305 \
    -p 127.0.0.1:3085:3085 \
    -p 6060:6060 \
    -v $(pwd)/keys:/root/keys:ro \
    -v $(pwd)/coda-config:/root/.coda-config \
    -v $(pwd)/peers.txt:/root/peers.txt \
    --env CODA_PRIVKEY_PASS='12345' \
    minaprotocol/mina-daemon-baked:4.1-turbo-pickles-mina880882e-autoa026dd9 daemon \
    -block-producer-key /root/keys/my-wallet \
    -peer-list-file /root/peers.txt \
    -metrics-port 6060 \
    -insecure-rest-server \
    -file-log-level Debug \
    -log-level Info

LOGS

https://drive.google.com/file/d/1chFyQnD3CnlU_AK2Y6ddJZpNJlDvUcD-/view?usp=sharing

c29r3 commented 3 years ago

2 times received the error described above and then node does not reboot more than a day image

ArtemBernatskyy commented 3 years ago

Are you sure this is not related to max ram allocated to Docker ?

c29r3 commented 3 years ago

Are you sure this is not related to max ram allocated to Docker ?

I have not set a RAM limit image

c29r3 commented 3 years ago

And here's the reason. Daemon tried to allocate 14649452 GB memory (14.63 PB)

Attempted to allocate 14649452162192538 bytesFatal error: out of memory.
Coda process exited with status code 2
+ echo 'Coda process exited with status code 2'
+ sleep 10
+ kill 14
+ '[' '!' -f stay_alive ']'
+ exit 0
+ mkdir -p .coda-config
+ touch .coda-config/coda-prover.log
+ touch .coda-config/coda-verifier.log
+ touch .coda-config/mina-best-tip.log
+ command=daemon
+ shift
garethtdavies commented 3 years ago

Another crash with this ~136GB

Attempted to allocate 136752041404 bytesFatal error: out of memory.

dudarov commented 3 years ago

I have too some problem.

: 2021-01-17 19:08:20 UTC [Error] Duplicate producer and slot: producer = $block_producer, block_producer: "B62qqLmebnE45d2myqMM5YNstVudto7VNzJmNNHKp4VyEmAbjXK3kRq" consensus_time: { "slot_number": "8637", "slots_per_epoch": "7140" } hash: "3NLfJ1nNhDzpkh5iuY36kh6Jo1scrqVfp73mNQbY36nP4zGPswi1" current_protocol_state_hash: "3NKYqMaCHqtKYdik9yz6MJn32YvN1KVixPmW86K6PHRTiRMwsXZP" Attempted to allocate 137386859548 bytesFatal error: out of memory.

gregbostrom commented 3 years ago

Same problem here with the latest build

Commit [DIRTY]a9402473584c36b347d52df4f4000b7286385987 on branch HEAD
Attempted to allocate 136348821002 bytesFatal error: out of memory.

Which is almost 128 GB

gregbostrom commented 3 years ago

Another occurrence.

Attempted to allocate 136703156765 bytesFatal error: out of memory.

on version

Commit [DIRTY]a9402473584c36b347d52df4f4000b7286385987 on branch HEAD
gregbostrom commented 3 years ago

With this latest version

Commit [DIRTY]d075f83d26490f6510bbb14bbfe3c771256257b5 on branch HEAD

Had this just now

Attempted to allocate 136771063836 bytesFatal error: out of memory.
gregbostrom commented 3 years ago

With the Encore testnet version

Commit [DIRTY]3ef86631e3a38150b5092faec47da144b0a46020 on branch HEAD

I've seen two occurrences in the past 24 hours.

Attempted to allocate 137127597980 bytesFatal error: out of memory.
Attempted to allocate 137065315144 bytesFatal error: out of memory.
jspada commented 3 years ago

On the Zenith testnet

Commit [DIRTY]245a3f7d883c516f5f16742cb1ca672872612851 on branch HEAD

got

Attempted to allocate 91978576875 bytesFatal error: out of memory.
gregbostrom commented 3 years ago

Star.LI#0785 had this on Encore.

Attempted to allocate 137288557616 bytesFatal error: out of memory.
My node is crashed by out of memory. Anyone else has the same issue?
BTW, my node is shipped with 128G memory.

https://discord.com/channels/484437221055922177/754653322845356103/813240010231775254

gregbostrom commented 3 years ago

I just had this on Zenith.

Commit [DIRTY]245a3f7d883c516f5f16742cb1ca672872612851 on branch HEAD
2021-02-22 03:44:04 UTC [Info] Received a block from $sender
        sender: {
  "Remote": {
    "host": "88.198.26.117",
    "peer_id": "12D3KooWH3kYuWRDnBDLn7z6xH9xTdnfRPZUDMrQYTbLnzPCZwEq",
    "libp2p_port": 8302
  }
}
Attempted to allocate 137351565754 bytesFatal error: out of memory.
gregbostrom commented 3 years ago

Another occurrence reported on discord. https://discord.com/channels/484437221055922177/812104065168310303/813821331606601743

crypto-guys commented 3 years ago

I had this happen not many hours ago.

Ledger Merkle root: jxdBddivtWL6zZwhdBDuK9ibpngS7P4rh73FR1jcUJbXwMUvddg Protocol state hash: 3NLWJdXYMrUi1dQkBGvG2Hpgv6qPkXZJJyMEUDwWNBWULxwViHx3 Chain id: 90b71f6f798dec88a1afc825cd0b358c6d8a3ff3c0b57a7fe97412ea5a639c2b Git SHA-1: fd3980820fb82c7355af49462ffefe6718800b77

Mar 7 15:17:18 mina-testworld mina[18179]: 2021-03-07 14:17:18 UTC [Info] Received a block from $sender Mar 7 15:17:18 mina-testworld mina[18179]: #011sender: { Mar 7 15:17:18 mina-testworld mina[18179]: "Remote": { Mar 7 15:17:18 mina-testworld mina[18179]: "host": "135.181.3.211", Mar 7 15:17:18 mina-testworld mina[18179]: "peer_id": "12D3KooWK1KyyDSrrtEcJvjZ56cTsMqtk46FWSuS7HpsGhUowWfh", Mar 7 15:17:18 mina-testworld mina[18179]: "libp2p_port": 8302 Mar 7 15:17:18 mina-testworld mina[18179]: } Mar 7 15:17:18 mina-testworld mina[18179]: } Mar 7 15:17:20 mina-testworld mina[18179]: 2021-03-07 14:17:20 UTC [Info] Received a block from $sender Mar 7 15:17:20 mina-testworld mina[18179]: #011sender: { Mar 7 15:17:20 mina-testworld mina[18179]: "Remote": { Mar 7 15:17:20 mina-testworld mina[18179]: "host": "94.74.101.26", Mar 7 15:17:20 mina-testworld mina[18179]: "peer_id": "12D3KooWNuA1pY5aY8x9nCcj8FqzACYDnXDAD4sK3LWvnrxyZrCd", Mar 7 15:17:20 mina-testworld mina[18179]: "libp2p_port": 1027 Mar 7 15:17:20 mina-testworld mina[18179]: } Mar 7 15:17:20 mina-testworld mina[18179]: } Mar 7 15:17:20 mina-testworld mina[18179]: {"timestamp":"2021-03-07 14:17:20.742036Z","level":"Debug","source":{"module":"Snark_workerFunctor","location" :"File \"src/lib/snark_worker/functor.ml\", line 169, characters 8-20"},"message":"Snark worker working directory $dir","metadata":{"dir":"/","pid":18278,"p rocess":"Snark Worker"}} Mar 7 15:17:20 mina-testworld mina[18179]: {"timestamp":"2021-03-07 14:17:20.742248Z","level":"Debug","source":{"module":"Snark_workerFunctor","location" :"File \"src/lib/snark_worker/functor.ml\", line 181, characters 6-18"},"message":"Snark worker using daemon $addr","metadata":{"addr":"127.0.0.1:8301","pid ":18278,"process":"Snark Worker"}} Mar 7 15:17:23 mina-testworld mina[18179]: Attempted to allocate 136548740755 bytesFatal error: out of memory. Mar 7 15:17:23 mina-testworld systemd[1]: mina.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Mar 7 15:17:24 mina-testworld systemd[1]: mina.service: Failed with result 'exit-code'.

vaddec-everstake commented 3 years ago

Every 3-10 hours I have a problem restarting the service. After I installed sidecar. attempted to allocate bytes fatal error out of memory

Screenshot from 2021-03-20 13-24-58 Screenshot from 2021-03-20 13-51-49 Screenshot from 2021-03-20 13-53-20

Uptime Screenshot from 2021-03-20 13-55-53

c29r3 commented 3 years ago

Still actual
minaprotocol/mina-archive:1.1.3-48401e9

Attempted to allocate 92152191331 bytesFatal error: out of memory.
+ tail -q -f mina.log
Mina process exited with status code 2
2021-03-27 07:37:37 UTC [Info] Coda daemon is booting up; built with commit "a8893ab6dd8a68171e7b99a5dc6b76940411350b" on branch "master"
Using password from environment variable CODA_PRIVKEY_PASS
2021-03-27 07:37:37 UTC [Info] Created daemon lockfile "/root/.mina-config/.mina-lock"
2021-03-27 07:37:37 UTC [Info] Registering async shutdown handler: "Remove daemon lockfile"
2021-03-27 07:37:37 UTC [Info] Daemon will expire at "2024-12-10 14:00:00-07:00"
2021-03-27 07:37:37 UTC [Info] Booting may take several seconds, please wait
2021-03-27 07:37:37 UTC [Info] Reading configuration files $config_files
    config_files: [
  "/var/lib/coda/config_a8893ab6.json", "/root/.mina-config/daemon.json",
  "/var/lib/coda/config_a8893ab.json"
rashidovich commented 3 years ago

almost once per 24h on every node I run. (1.1.3-48401e9, 1.1.4-a8893ab)

Attempted to allocate 136854715828 bytesFatal error: out of memory. Mina process exited with status code 2

gregbostrom commented 3 years ago

Another occurrence reported on discord.

https://discord.com/channels/484437221055922177/799597981762453535/825869389995704400

gregbostrom commented 3 years ago

I just had this two hours ago.

!!! 2021-03-29 06:01:16 UTC [Info] Received a block from $sender
!!!     sender: {
!!!   "Remote": {
!!!     "host": "3.236.207.131",
!!!     "peer_id": "12D3KooWF162ZD7FNU29or3AMRPMB5pTvG2ZtZJdxciBGLXXNsUy",
!!!     "libp2p_port": 8302
!!!   }
!!! }
[2021-3-29 06:01:19.171322]Snark_worker__Functor: Snark worker working directory "/home/zbostrom"
[2021-3-29 06:01:19.171418]Snark_worker__Functor: Snark worker using daemon "127.0.0.1:8301"
[2021-3-29 06:01:19.374481]Snark_worker__Functor: No jobs available. Napping for 5.954796458038227 seconds
!!! Attempted to allocate 136557433134 bytesFatal error: out of memory.
gregbostrom commented 3 years ago

Again, just now:

2021-03-29 16:43:08 UTC [Info] Received a block from $sender
        sender: {
  "Remote": {
    "host": "34.75.16.224",
    "peer_id": "12D3KooWEdBiTUQqxp3jeuWaZkwiSNcFxC6d6Tdq7u2Lf2ZD2Q6X",
    "libp2p_port": 10003
  }
}
Attempted to allocate 136586112995 bytesFatal error: out of memory.
gregbostrom commented 3 years ago

Again, about 9 hours ago.

2021-03-30 19:49:29 UTC [Info] Received a block from $sender
        sender: {
  "Remote": {
    "host": "135.181.76.248",
    "peer_id": "12D3KooWS7jKsHMuMp8sCjSQVFqU3b9hLwBRHDjjaiUXhYDYkt3v",
    "libp2p_port": 8302
  }
}
Attempted to allocate 136682221557 bytesFatal error: out of memory.
gregbostrom commented 3 years ago

Again

2021-04-03 11:46:06 UTC [Info] Received a block from $sender
        sender: {
  "Remote": {
    "host": "195.201.173.222",
    "peer_id": "12D3KooWDvHJgAF2jyBug5u4R7tWooXAUsdrSKEtVDeq8JnED5cY",
    "libp2p_port": 8302
  }
}
Attempted to allocate 92661310289 bytesFatal error: out of memory.
mrmr1993 commented 3 years ago

@gregbostrom can you share the flags you're using? We've been looking for a way to reproduce this so that we can capture some core dumps.. it seems you've found a pretty reliable one!

gregbostrom commented 3 years ago
mina daemon \
    -peer-list-file ~/peers.txt \
    -generate-genesis-proof true \
    -block-producer-key ~/keys/my-wallet \
    -block-producer-password $MINA_PRIVKEY_PASS \
    -file-log-level Info \
    -log-level Info \
    -limited-graphql-port 3095

I think most people encountering this problem are not reporting it and I would not call mine a reliable case.

gregbostrom commented 3 years ago

I just keep reporting it because I think it needs to be fixed. You might consider screening out very very large memory allocation requests and ignore the current request.

garethtdavies commented 3 years ago

I think most people encountering this problem are not reporting it and I would not call mine a reliable case.

I just checked logs for my nodes for the last 7 days and 15 occurrences of this.

CleanShot 2021-04-03 at 12 14 56@2x
EmreNOP commented 3 years ago

Attempted to allocate 91800674834 bytesFatal error: out of memory. Mina process exited with status code 2 2021-04-01 03:46:42 UTC [Info] Coda daemon is booting up; built with commit "a8893ab6dd8a68171e7b99a5dc6b76940411350b" on branch "master" Using password from environment variable CODA_PRIVKEY_PASS

Attempted to allocate 136790250923 bytesFatal error: out of memory. Mina process exited with status code 2 2021-04-05 13:32:10 UTC [Info] Coda daemon is booting up; built with commit "a8893ab6dd8a68171e7b99a5dc6b76940411350b" on branch "master" Using password from environment variable CODA_PRIVKEY_PASS

Attempted to allocate 136522102730 bytesFatal error: out of memory. Mina process exited with status code 2 2021-04-06 17:47:08 UTC [Info] Coda daemon is booting up; built with commit "a8893ab6dd8a68171e7b99a5dc6b76940411350b" on branch "master" Using password from environment variable CODA_PRIVKEY_PASS

Attempted to allocate 137166189101 bytesFatal error: out of memory. Mina process exited with status code 2 2021-04-07 13:17:39 UTC [Info] Coda daemon is booting up; built with commit "a8893ab6dd8a68171e7b99a5dc6b76940411350b" on branch "master" Using password from environment variable CODA_PRIVKEY_PASS

4 crashes. without clean .mina-config not going well.

hsq125 commented 3 years ago

I think the problem of memory allocation does still exist in version 1.2.0beta1-c856692-mainnet

Possibly more hidden than before

2021-07-02 10:28:28 UTC [Error] Possible reason for signal: "Process killed because out of memory"

Log extract 2021-07-02 10:27:31 UTC [Warn] RPC call error for "get_transition_chain_proof" 2021-07-02 10:27:35 UTC [Error] error sending message on stream 25703: $error error: { "commit_id": "c856692fddc525a673ba075f714811b5c50bd3a7", "string": "RPC #46101 failed: \"only wrote 0 out of 9 bytes error: libp2p error error: closed stream\"" } 2021-07-02 10:27:35 UTC [Warn] RPC call error for "get_transition_chain" 2021-07-02 10:27:41 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] Not rebroadcasting block $state_hash because it was received "1 slots too late" state_hash: "3NLex6ZoBK5FxhxRwPwmEBHLVEycVBf2kWULC9gTcDwPq5wskHQz" 2021-07-02 10:27:51 UTC [Info] Saw block with state hash $state_hash state_hash: "3NLex6ZoBK5FxhxRwPwmEBHLVEycVBf2kWULC9gTcDwPq5wskHQz" 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:51 UTC [Warn] initial_validate: disconnected chain 2021-07-02 10:27:54 UTC [Warn] RPC call error for "get_transition_chain_proof" 2021-07-02 10:27:55 UTC [Info] Received a block from $sender sender: { "Remote": { "host": "34.122.226.192", "peer_id": "12D3KooWAbpbSc9WbfkrJE8FQNLebzL9A7WGUVaMFaEjexW4MPmU", "libp2p_port": 8302 } } 2021-07-02 10:28:05 UTC [Fatal] Unhandled top-level exception: $exn Generating crash report exn: { "commit_id": "c856692fddc525a673ba075f714811b5c50bd3a7", "sexp": [ "monitor.ml.Error", "Cached item has already been finalized", [ "Raised at file \"src/error.ml\" (inlined), line 9, characters 14-30", "Called from file \"src/lib/cache_lib/impl.ml\", line 173, characters 30-51", "Called from file \"src/lib/cache_lib/impl.ml\", line 199, characters 6-69", "Called from file \"src/lib/ledger_catchup/super_catchup.ml\", line 928, characters 25-58", "Called from file \"src/deferred1.ml\", line 17, characters 40-45", "Called from file \"src/job_queue.ml\" (inlined), line 131, characters 2-5", "Called from file \"src/job_queue.ml\", line 171, characters 6-47", "Caught by monitor coda" ] ], "backtrace": [ "Raised at file \"format.ml\" (inlined), line 242, characters 35-52", "Called from file \"format.ml\", line 469, characters 8-33", "Called from file \"format.ml\", line 484, characters 6-24" ] } 2021-07-02 10:28:05 UTC [Info] Updating new available work took 21.804094314575195 ms 2021-07-02 10:28:28 UTC [Error] Daemon child process 97 terminated after receiving signal "sigkill" 2021-07-02 10:28:28 UTC [Error] Possible reason for signal: "Process killed because out of memory" 2021-07-02 10:28:28 UTC [Error] Child process of kind "Prover" with pid 97 has unexpectedly terminated 2021-07-02 10:28:28 UTC [Fatal] Unhandled top-level exception: $exn Generating crash report exn: { "commit_id": "c856692fddc525a673ba075f714811b5c50bd3a7", "sexp": [ "monitor.ml.Error", [ "Failure", "Child process of kind Prover has unexpectedly terminated" ], [ "Raised at file \"stdlib.ml\", line 33, characters 17-33", "Called from file \"src/app/cli/src/cli_entrypoint/mina_cli_entrypoint.ml\", line 579, characters 10-94", "Called from file \"src/deferred0.ml\", line 56, characters 64-69", "Called from file \"src/job_queue.ml\" (inlined), line 131, characters 2-5", "Called from file \"src/job_queue.ml\", line 171, characters 6-47", "Caught by monitor coda" ] ], "backtrace": [ "Raised by primitive operation at file \"src/signal.ml\", line 162, characters 6-61" ] } 2021-07-02 10:28:28 UTC [Error] verifier terminated unexpectedly 2021-07-02 10:28:28 UTC [Info] Starting a new verifier process 2021-07-02 10:28:28 UTC [Info] verifier successfully stopped 2021-07-02 10:28:28 UTC [Info] Rebroadcasting $state_hash state_hash: "3NLnPyR37jEecT8RdmJ6m7HKnNxicr8gj1kKA4x5iZC4nueGevJ4" 2021-07-02 10:28:28 UTC [Fatal] libp2p_helper process died unexpectedly: "died after receiving sigkill (signal number 9)" 2021-07-02 10:28:28 UTC [Error] error during validationComplete, ignoring and continuing: $error error: { "commit_id": "c856692fddc525a673ba075f714811b5c50bd3a7", "string": "helper process already exited (doing RPC {\"seqno\":25735,\"is_valid\":\"accept\"})" }

jadechip commented 3 years ago

I'm seeing a lot of these kinds of errors in my logs, not sure if it's related.

2021-07-22 09:44:53 UTC [Error] error sending message on stream 1050: $error
    error: {
  "commit_id": "a42bdeef6b0c15ee34616e4df76c882b0c5c7c2a",
  "string":
    "RPC #9486 failed: \"only wrote 0 out of 9 bytes error: libp2p error error: closed stream\""
}

and

2021-07-22 09:31:38 UTC [Warn] verification of blockchain snark failed but it was our fault
2021-07-22 09:31:38 UTC [Error] error sending message on stream 1023: $error
    error: {
  "commit_id": "a42bdeef6b0c15ee34616e4df76c882b0c5c7c2a",
  "string":
    "RPC #9282 failed: \"only wrote 0 out of 36 bytes error: libp2p error error: closed stream\""
}
kucharskim commented 3 years ago

We are on minaprotocol/mina-daemon-baked:1.1.5-a42bdee and we periodically get Attempted to allocate 136699113500 bytesFatal error: out of memory. crash.

I see multiple issues open for memory related issues. Is there any main one tracking the problem?

kucharskim commented 3 years ago

Machine is on:

# free -h
              total        used        free      shared  buff/cache   available
Mem:          125Gi       7.4Gi        51Gi       0.0Ki        66Gi       117Gi
Swap:         4.0Gi       4.0Mi       4.0Gi
scarletbright commented 3 years ago

Got this today on 1.1.5-a42bdeet:

Aug 24 09:59:40 Ubuntu-1804-bionic-64-minimal mina[2359]: Attempted to allocate 137200417204 bytesFatal error: out of memory.
Aug 24 09:59:41 Ubuntu-1804-bionic-64-minimal systemd[1043]: mina.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 24 09:59:41 Ubuntu-1804-bionic-64-minimal systemd[1043]: mina.service: Failed with result 'exit-code'. 
scarletbright commented 3 years ago

Release 1.2.0beta6-bee023a:

Sep 18 01:33:35 kernel: [583074.116491] [UFW BLOCK] IN=enp1s0 OUT= MAC=68:05:ca:e6:44:a9:5c:5e:ab:d0:66:c0:08:00 SRC=89.248.165.61 DST=209.236.118.26 LEN=40 TOS=0x00 PREC=0x00 TTL=242 ID=13885 PROTO=TCP SPT=43882 DPT=40159 WINDOW=1024 RES=0x00 SYN URGP=0 
Sep 18 01:33:43 mina[12832]: Attempted to allocate 15253661536846873 bytesFatal error: out of memory.
Sep 18 01:33:43 systemd[1522]: mina.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
p-shahi commented 2 years ago

This instance of this issue was reproduced and addressed in a number of prs in late 2020 early 2021. If we see this again, we can reopen.