PKUHPC / SCOW

Super Computing On Web
https://www.pkuscow.com/
Mulan Permissive Software License, Version 2
180 stars 39 forks source link

求助adapter 返回 The gres set error错误 #1276

Open menkeyi001 opened 1 month ago

menkeyi001 commented 1 month ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

发生了什么 | What happened

scow

获取集群信息报错 http://10.192.1.87/api/dashboard/getClusterInfo?clusterId=hpc01 内容 {"code":"ADAPTER_CALL_ON_ONE_ERROR","details":"Cluster ID : hpc01 Details : Error: 5 NOT_FOUND: The gres set error.","clusterErrorsArray":[{"clusterId":"hpc01","details":{"code":5,"details":"The gres set error.","metadata":{"content-type":["application/grpc"],"grpc-status-details-bin":[{"type":"Buffer","data":[8,5,18,19,84,104,101,32,103,114,101,115,32,115,101,116,32,101,114,114,111,114,46,26,60,10,40,116,121,112,101,46,103,111,111,103,108,101,97,112,105,115,46,99,111,109,47,103,111,111,103,108,101,46,114,112,99,46,69,114,114,111,114,73,110,102,111,18,16,10,14,71,82,69,83,95,78,79,84,95,70,79,85,78,68]}]}}}]}

scow-slurm-adapter 可以开启debug么 adapter 的日志也没有错误显示

slurm是编译安装 目录如下
(base) root@slurmcontroller:/adapter# ls /etc/slurm/
bin  etc  include  lib  sbin  share
(base) root@slurmcontroller:/adapter# ls /etc/slurm/etc/
cgroup.conf  gres.conf  plugstack.conf  plugstack.conf.d  slurm.conf  slurm.conf.bak  slurmdbd.conf

adapter配置信息
(base) root@slurmcontroller:/adapter# cat config/config.yaml 
# slurm 数据库配置
mysql:
  host: 10.192.1.39
  port: 3306
  user: root
  dbname: slurm_acct_db
  password: abc@123
  clustername: cluster
  databaseencode: latin1

# 服务端口设置
service:
  port: 8972

# slurm 默认Qos设置
slurm:
  defaultqos: normal
  slurmpath: /etc/slurm/

# module profile文件路径
modulepath:
  path: /data/share/software/module/5.2.0/init/profile.sh

期望结果 | What did you expect to happen

No response

之前运行正常吗? | Did this work before?

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Scheduler:
- Docker:
- Docker-compose:
- SCOW cli:
- SCOW:
- Adapter:

备注 | Anything else?

No response

vanstriker commented 1 week ago

需要使用最新版的adapter