Open XiXiTan opened 2 months ago
另一个小问题: 如果worker缓存设置为512M,实际会使用1024M。这超过了缓存设置512,和woker缓存使用预期不符。 MEM HDD capacity 30.50GB 512.00MB 30.00GB used 4083.94MB (13%) 1024.00MB 3059.94MB
源码中只看到对于未设定缓存时的默认值,会取系统获取2/3内存或者给1g。没有看到对于指定缓存时,会取其他值的逻辑。
`
public static final PropertyKey WORKER_RAMDISK_SIZE = dataSizeBuilder(Name.WORKER_RAMDISK_SIZE) .setAlias(Name.WORKER_MEMORY_SIZE) .setDefaultSupplier(() -> { try { OperatingSystemMXBean operatingSystemMXBean = (OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean(); return operatingSystemMXBean.getTotalPhysicalMemorySize() * 2 / 3; } catch (Throwable e) { // The package com.sun.management may not be available on every platform. // fallback to a reasonable size. return "1GB"; } }, "2/3 of total system memory, or 1GB if system memory size cannot be determined") .setDescription("The allocated memory for each worker node's ramdisk(s). "
`
Alluxio Version: What version of Alluxio are you using? 2.9.0.1
Describe the bug A clear and concise description of what the bug is. 内存设置有富裕,但worker pod会出现被OOMKilled情况。 请教可能是哪块儿内存使用超出预期?以及缓存为啥会用超过设置的取值?
pod申请资源: cpu: 4 memory: 16G 使用资源: xmx=4g MaxDirectMemorySize=4g alluxio.worker.ramdisk.size=6g 预留内存=2g
具体内存设置: /usr/lib/jvm/java-1.8.0-openjdk/bin/java -cp /opt/alluxio-2.9.0.1-noHelm/conf/::/opt/alluxio/ranger-lib/*:/opt/alluxio-2.9.0.1-noHelm/assembly/alluxio-server-2.9.0.1.jar -Dalluxio.logger.type=Console,WORKER_LOGGER -Dsun.security.krb5.disableReferrals=true -Dalluxio.home=/opt/alluxio-2.9.0.1-noHelm -Dalluxio.conf.dir=/opt/alluxio-2.9.0.1-noHelm/conf -Dalluxio.logs.dir=/opt/alluxio-2.9.0.1-noHelm/logs -Dalluxio.user.logs.dir=/opt/alluxio-2.9.0.1-noHelm/logs/user -Dlog4j.configuration=file:/opt/alluxio-2.9.0.1-noHelm/conf/log4j.properties -Dorg.apache.jasper.compiler.disablejsr199=true -Djava.net.preferIPv4Stack=true -Dorg.apache.ratis.thirdparty.io.netty.allocator.useCacheForAllThreads=false -Dalluxio.worker.hostname=ip -Xmx4096M -XX:MaxDirectMemorySize=4096M alluxio.worker.AlluxioWorker
conf/alluxio-site.properties alluxio.worker.ramdisk.size=6144M
缓存使用:
出问题pod的cpu、mem情况
To Reproduce Steps to reproduce the behavior (as minimally and precisely as possible)
Expected behavior A clear and concise description of what you expected to happen. worker pod不要OOMKilled
Urgency Describe the impact and urgency of the bug.
Are you planning to fix it Please indicate if you are already working on a PR.
Additional context Add any other context about the problem here.