Consensys / orion

Orion is a PegaSys component for doing private transactions
https://docs.orion.consensys.net/
Apache License 2.0
91 stars 43 forks source link

Encountering memory issues related to Toweni/LibSodium #425

Closed Viserius closed 3 years ago

Viserius commented 3 years ago

When configuring a private network with Besu, whether we use legacy private groups or Besu-extended data privacy, my local Orion nodes seem to crash after receiving about 400 transactions (irrespective of the speed of sending them).

Below is the error I encounter.

{"timestamp":"2021-06-15T09:02:34,594","container":"f47ea59bce4d","level":"WARN","thread":"vert.x-eventloop-thread-9","class":"HttpErrorHandler","message":"Non OrionException, default unmapped code used","throwable":" java.lang.OutOfMemoryError: Sodium.sodium_malloc failed allocating 32\n\tat org.apache.tuweni.crypto.sodium.Sodium.malloc(Sodium.java:227)\n\tat org.apache.tuweni.crypto.sodium.Sodium.dup(Sodium.java:265)\n\tat org.apache.tuweni.crypto.sodium.Sodium.dup(Sodium.java:276)\n\tat org.apache.tuweni.crypto.sodium.Box$PublicKey.fromBytes(Box.java:111)\n\tat net.consensys.orion.http.handler.receive.ReceiveHandler.handle(ReceiveHandler.java:69)\n\tat net.consensys.orion.http.handler.receive.ReceiveHandler.handle(ReceiveHandler.java:43)\n\tat io.vertx.ext.web.impl.RouteImpl.handleContext(RouteImpl.java:232)\n\tat io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:121)\n\tat io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:134)\n\tat io.vertx.ext.web.handler.impl.ResponseContentTypeHandlerImpl.handle(ResponseContentTypeHandlerImpl.java:54)\n\tat io.vertx.ext.web.handler.impl.ResponseContentTypeHandlerImpl.handle(ResponseContentTypeHandlerImpl.java:28)\n\tat io.vertx.ext.web.impl.RouteImpl.handleContext(RouteImpl.java:232)\n\tat io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:86)\n\tat io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:134)\n\tat io.vertx.ext.web.handler.impl.LoggerHandlerImpl.handle(LoggerHandlerImpl.java:178)\n\tat io.vertx.ext.web.handler.impl.LoggerHandlerImpl.handle(LoggerHandlerImpl.java:47)\n\tat io.vertx.ext.web.impl.RouteImpl.handleContext(RouteImpl.java:232)\n\tat io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:86)\n\tat io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:134)\n\tat io.vertx.ext.web.handler.impl.BodyHandlerImpl$BHandler.doEnd(BodyHandlerImpl.java:296)\n\tat io.vertx.ext.web.handler.impl.BodyHandlerImpl$BHandler.end(BodyHandlerImpl.java:276)\n\tat io.vertx.ext.web.handler.impl.BodyHandlerImpl.lambda$handle$0(BodyHandlerImpl.java:87)\n\tat io.vertx.core.http.impl.HttpServerRequestImpl.onEnd(HttpServerRequestImpl.java:529)\n\tat io.vertx.core.http.impl.HttpServerRequestImpl.handleEnd(HttpServerRequestImpl.java:515)\n\tat io.vertx.core.http.impl.Http1xServerConnection.handleEnd(Http1xServerConnection.java:172)\n\tat io.vertx.core.http.impl.Http1xServerConnection.handleContent(Http1xServerConnection.java:159)\n\tat io.vertx.core.http.impl.Http1xServerConnection.handleMessage(Http1xServerConnection.java:136)\n\tat io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:320)\n\tat io.vertx.core.impl.EventLoopContext.execute(EventLoopContext.java:43)\n\tat io.vertx.core.impl.ContextImpl.executeFromIO(ContextImpl.java:188)\n\tat io.vertx.core.net.impl.VertxHandler.channelRead(VertxHandler.java:173)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:345)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:337)\n\tat io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)\n\tat io.netty.handler.codec.http.websocketx.extensions.WebSocketServerExtensionHandler.channelRead(WebSocketServerExtensionHandler.java:102)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:345)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:337)\n\tat io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)\n\tat io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:345)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:337)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:345)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:677)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:612)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:529)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:491)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.base/java.lang.Thread.run(Unknown Source)\n"}

OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007fb150785000, 65536, 1) failed; error='Not enough space' (errno=12)

# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory.

It seems that Orion uses a software component written in C called Apache Tuweni, for key management. Namely, every time a public key is parsed, an org.apache.tuweni.crypto.sodium.Box.PublicKey instance is created by reserving memory using malloc. Apparently, the memory is never freed in my case. This does not just happen only through the Receive call, but I also saw the Box.PublicKey objects are created in other places.

I am confident this is not due to allocating too little memory to the java instance, as I tried allocating 4GB up to 8GB of RAM to a single Orion instance, which should be plenty for the tasks it should perform.

macfarla commented 3 years ago

Hi Viserius, can you tell us what version of Orion you're using?

Viserius commented 3 years ago

Latest, i.e. 21.1.0. For context, I encounter this when generating private transactions with Hyperledger Caliper. This error occurs both when generating transactions at a speed of 10TPS or at a speed of 100+ TPS.

macfarla commented 3 years ago

Thanks for that @Viserius. Are you able to share the Caliper setup or scripts - are you using a different Orion key for each private tx? ie are we talking 400 keys or just one key reused?

Also please be aware that while we're currently still supporting Orion users (until November), the project is officially deprecated. Tessera includes all the same functionality plus some extra features. Announcement https://github.com/ConsenSys/orion/blob/master/CHANGELOG.md#project-deprecation

Viserius commented 3 years ago

Good to know the move to Tessera has been made! Its integration with Besu was still unstable/unfinished while I conducted my research project, which is why I sticked to Orion. As such, we can probably let this rest (unless you want to dive deeper into this to resolve it). In that case, I shared my Caliper workload and benchmark here. Basically, transactions are executed by about 100 to 400 Ethereum accounts (EOA). The key of the privacy group (for Orion) is the same for all transactions.

vmichalik commented 3 years ago

We're still investigating this issue and will get back to you with what we find @Viserius

mark-terry commented 3 years ago

We've investigated the issue, but weren't able to replicate. Please raise another ticket if it's encountered again. Thanks!