dcSpark / milkomeda-c1-evm-passive

Milkomeda evm node configurations for partners wanting to connect to the evm nodes p2p network
MIT License
15 stars 5 forks source link

ElasticSearch error #13

Closed tropicalrussian closed 2 years ago

tropicalrussian commented 2 years ago

I'm using this to run a Graph node, and while my subgraph is indexing (after 300-500k blocks) the Milkomeda node starts throwing this error and stops responding to the Graph node:

c1-mainnet-passive-besu-1 | {"timestamp":"2022-05-04T16:47:49,062","container":"62599d7e2f353900013ffe0b","level":"ERROR","thread":"vert.x-acceptor-thread-0","class":"rejectedExecution","message":"Failed to submit a listener notification task. Event loop shut down?","throwable":" java.util.concurrent.RejectedExecutionException: event executor terminated\n\tat io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:923)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:350)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:343)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:825)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:815)\n\tat io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:841)\n\tat io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:499)\n\tat io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:184)\n\tat io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)\n\tat io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)\n\tat io.netty.bootstrap.ServerBootstrap$ServerBootstrapAcceptor.channelRead(ServerBootstrap.java:215)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tat io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:97)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:487)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n"}

The node doesn't crash but the error repeats continuously and the Graph node gets no response when trying to fetch more blocks. There's a few mentions of this online, that basically suggest it's caused by overutilization. Server is a Linode VPS running Ubuntu 20.04 and peak CPU utilization is ~90%.

rinor commented 2 years ago
  1. Mainnet or Devnet?
  2. What are the resources of the machine with the Besu node on it.
  3. Have you tried to increase the memory for Besu on this setup from docker-compose file since we set it to 1gb as an example, but in prod, depending on the service that uses besu it’s smth like 15gb-30gb because explorers and indexer are very hungry resource wise
rinor commented 2 years ago

Indeed the node doesn’t crash and it may even continue fetching blocks (although with unstable pace) it’s mostly unusable for anything else

tropicalrussian commented 2 years ago
  1. Mainnet or Devnet?

    1. What are the resources of the machine with the Besu node on it.

    2. Have you tried to increase the memory for Besu on this setup from docker-compose file since we set it to 1gb as an example, but in prod, depending on the service that uses besu it’s smth like 15gb-30gb because explorers and indexer are very hungry resource wise

  1. Mainnet
  2. 8GB memory, 150G SSD, 4 CPUs 2.10GHz. So you think I'll need minimum 15G memory to do this?
  3. No, where is that set in docker-compose.yml? I only see resource limits set for logging:

logging: driver: "json-file" options: max-size: "1m" max-file: "10"

Along those lines I'm going to try adding this and see if it works, will comment again when it finishes indexing (or doesn't):

deploy: resources: limits: memory: 8000M

rinor commented 2 years ago
  1. Not yet, lets see how far we can go with config changes first.
  2. https://github.com/dcSpark/milkomeda-evm-passive/blob/b6ff3f38138e324a708c397bec2bf89bf9377766/c1-mainnet/docker-compose.yml#L12
tropicalrussian commented 2 years ago

Oh I see, thanks @rinor I really appreciate it. Giving this a try now.

tropicalrussian commented 2 years ago

Update: I didn't re-encounter the same error after increasing the limit to 8G (approximately the max available) but the graph node went down a couple times and periodically couldn't get a response from the Milkomeda node for a few minutes at a time, so I've moved to a machine with 16G. No peers available at the moment so it's taking awhile to get things going again, but I'll post again when I find out whether this is sufficient memory resources.

tropicalrussian commented 2 years ago

I ended up using a machine with 32G memory to assure good uptime, but depending on use case/how intensive your subgraph's usage is, you can get away with 16G.