PKUHPC / OpenSCOW

Super Computing On Web
https://www.pkuscow.com/
Mulan Permissive Software License, Version 2
220 stars 49 forks source link

用户空间的单节点内存不显示 #1460

Open liu-shaobo opened 1 week ago

liu-shaobo commented 1 week ago

发生了什么 | What happened

用户空间的单节点内存不显示

image

运行环境 | Environment

- OS: Ubuntu 20.04
- Scheduler: slurm-22.05.11
- Docker: 24.0.9
- Docker-compose: 2.24.1
- SCOW cli: v1.5.2
- SCOW: v1.5.2
- Adapter: slurm-adapter v1.5.0
piccaSun commented 6 days ago

请问是否还有这个问题,如果还有请您确认slurm下 scontrol show partition=C96T4下的分区内存信息是否显式正常

liu-shaobo commented 5 days ago

是因为mem后面的值为T,所以显示有问题?我看适配器里面是取RealMemory的值吧!

image
283713406 commented 4 days ago

麻烦执行下这条命令 scontrol show node=node221 | grep RealMemory=| awk '{print $1}' | awk -F'=' '{print $2}'

liu-shaobo commented 3 days ago
image
283713406 commented 3 days ago

适配器https://github.com/PKUHPC/scow-slurm-adapter是用最新的代码编译的吗?麻烦能否将日志级别调整为trace,然后看看日志里面GetClusterConfig:或者GetAvailablePartitions:字眼的日志?

liu-shaobo commented 3 days ago

scow使用1.5.2,适配器使用的1.5.0,适配器的日志级别调整为trace。

# 日志级别
log:
  level: "trace"

重启适配器后,级别还是info,这是什么原因?

{"level":"info","msg":"Received request GetClusterConfig: ","time":"2024-11-28T17:13:43+08:00"}
piccaSun commented 2 days ago

您好,在1.5.0版本中还没有增加trace级别日志的判断 更新适配器到 master ,更改日志等级为 trace 才可以打印更详细的日志

此问题已定位,后续我们会对此做出修改,感谢您的发现