Netflix / spectator

Client library for collecting metrics.
Apache License 2.0
738 stars 167 forks source link

extension for tracking off-heap memory use for netty #475

Open brharrington opened 6 years ago

brharrington commented 6 years ago

Netty is used widely at Netflix and it uses off-heap memory that is not showing up in the existing buffer pool metrics. It would be useful if we could support an extension for tracking the memory use of netty allocators.

brharrington commented 6 years ago

Some more information: https://dzone.com/articles/default-hotspot-maximum-direct-memory-size

brharrington commented 6 years ago

Test program:

import java.lang.management.*;
import java.lang.reflect.*;
import java.nio.*;
import sun.misc.*;

public class Test {

  private static void dumpBufferStats() {
    System.out.println("==========================================");
    System.out.println("BufferPoolMXBean");
    System.out.println("------------------------------------------");
    for (BufferPoolMXBean mbean : ManagementFactory.getPlatformMXBeans(BufferPoolMXBean.class)) {
      System.out.printf("%s: count %d, capacity %d, used %d%n",
        mbean.getName(),
        mbean.getCount(),
        mbean.getTotalCapacity(),
        mbean.getMemoryUsed());
    }

    JavaNioAccess.BufferPool pool = SharedSecrets.getJavaNioAccess().getDirectBufferPool();
    System.out.printf("SharedSecrets: %s: count %d, capacity %d, used %d%n",
      pool.getName(),
      pool.getCount(),
      pool.getTotalCapacity(),
      pool.getMemoryUsed());
    System.out.printf("VM.maxDirectMemory: %d%n", VM.maxDirectMemory());
    System.out.println("==========================================\n\n");
  }

  private static Unsafe getUnsafe() throws Exception {
    Field f = Unsafe.class.getDeclaredField("theUnsafe");
    f.setAccessible(true);
    return (Unsafe) f.get(null);
  }

  public static void main(String[] args) throws Exception {
    dumpBufferStats();

    System.out.println("ByteBuffer.allocateDirect(4096)");
    ByteBuffer.allocateDirect(4096);

    dumpBufferStats();

    System.out.println("Unsafe.allocateMemory(4096)");
    long address = getUnsafe().allocateMemory(4096);

    dumpBufferStats();

    System.out.println("Unsafe.freeMemory(address)");
    getUnsafe().freeMemory(address);

    dumpBufferStats();
  }
}

Neither BufferPoolMXBean or SharedSecrets captures memory allocated using Unsafe.allocateDirect. Output:

$ java Test
==========================================
BufferPoolMXBean
------------------------------------------
direct: count 0, capacity 0, used 0
mapped: count 0, capacity 0, used 0
SharedSecrets: direct: count 0, capacity 0, used 0
VM.maxDirectMemory: 3817865216
==========================================

ByteBuffer.allocateDirect(4096)
==========================================
BufferPoolMXBean
------------------------------------------
direct: count 1, capacity 4096, used 4096
mapped: count 0, capacity 0, used 0
SharedSecrets: direct: count 1, capacity 4096, used 4096
VM.maxDirectMemory: 3817865216
==========================================

Unsafe.allocateMemory(4096)
==========================================
BufferPoolMXBean
------------------------------------------
direct: count 1, capacity 4096, used 4096
mapped: count 0, capacity 0, used 0
SharedSecrets: direct: count 1, capacity 4096, used 4096
VM.maxDirectMemory: 3817865216
==========================================

Unsafe.freeMemory(address)
==========================================
BufferPoolMXBean
------------------------------------------
direct: count 1, capacity 4096, used 4096
mapped: count 0, capacity 0, used 0
SharedSecrets: direct: count 1, capacity 4096, used 4096
VM.maxDirectMemory: 3817865216
==========================================

$ java -XX:MaxDirectMemorySize=4g Test
==========================================
BufferPoolMXBean
------------------------------------------
direct: count 0, capacity 0, used 0
mapped: count 0, capacity 0, used 0
SharedSecrets: direct: count 0, capacity 0, used 0
VM.maxDirectMemory: 4294967296
==========================================

ByteBuffer.allocateDirect(4096)
==========================================
BufferPoolMXBean
------------------------------------------
direct: count 1, capacity 4096, used 4096
mapped: count 0, capacity 0, used 0
SharedSecrets: direct: count 1, capacity 4096, used 4096
VM.maxDirectMemory: 4294967296
==========================================

Unsafe.allocateMemory(4096)
==========================================
BufferPoolMXBean
------------------------------------------
direct: count 1, capacity 4096, used 4096
mapped: count 0, capacity 0, used 0
SharedSecrets: direct: count 1, capacity 4096, used 4096
VM.maxDirectMemory: 4294967296
==========================================

Unsafe.freeMemory(address)
==========================================
BufferPoolMXBean
------------------------------------------
direct: count 1, capacity 4096, used 4096
mapped: count 0, capacity 0, used 0
SharedSecrets: direct: count 1, capacity 4096, used 4096
VM.maxDirectMemory: 4294967296
==========================================
asgs commented 5 years ago

@brharrington in both the tests you've done above, the used and capacity values were 0 in the beginning and then after calling ByteBuffer.allocateDirect(4096), they were set to 4096 accordingly. But they remained the same even after calling Unsafe.allocateMemory(4096). So, did you mean to say Unsafe.allocateMemory instead?

Neither BufferPoolMXBean or SharedSecrets captures memory allocated using Unsafe.allocateDirect. Output:

Also, is the BufferPoolMeter's equivalent of BufferPoolMXBean's usedMemory the one in this line? https://github.com/Netflix/spectator/blob/5b72bb6f11b4a9f81fd76acd8034ef2bad075376/spectator-ext-jvm/src/main/java/com/netflix/spectator/jvm/BufferPoolMeter.java#L38

brharrington commented 5 years ago

Yes, I meant Unsafe.allocateMemory.

The buffer pool memoryUsed metric will get set to the value from BufferPoolMXBean.getMemoryUsed.

brharrington commented 5 years ago

For a lot of the common netty usage, it is probably possible to use PooledByteBufAllocator.metric. The other problem we have run into is a lot of our internal usage now shadows netty (e.g. gRPC) so it is a bit harder for us to hook into the right places for each shadowed version. That is partly why I would prefer to capture it at the JVM level, but I don't think that is possible right now.