Share GPU between Pods in Kubernetes
Apache License 2.0
193 stars 42 forks source link

some questions about KubeShare2.0 #24

Open She-xj opened 1 year ago

She-xj commented 1 year ago

Hello! I am installing the KubeShare2.0. I have finished the preparation and have output of kubectl describe node

  cpu:                16
  ephemeral-storage:  29352956Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16248988Ki
  nvidia.com/gpu:     1
  pods:               110
  cpu:                16
  ephemeral-storage:  27051684205
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16146588Ki
  nvidia.com/gpu:     1
  pods:               110

When I follow deploy.md, I have some questions:

  1. Where to place kubeshare-config.yaml file? Could you please tell me its absolute path?
  2. I wonder how to "Make sure the enpoint of kubeshare-aggregator & kubeshare-collector of prometheus is up.".
  3. And also, could you please show me the config files of prometheus when monitoring kubernetes?

I am a beginner in this field so I will be so much grateful if you provide more details when building the KubeShare2.0 system. Looking forward to your reply! Thanks a lot!

icovej commented 1 year ago

Maybe I can answer some your questions. The first is that you need place kubeshare-config.yaml under /kubeshare/scheduler The second is that you can run prometheus to make sure their enpoint is up.

But I also have some questions about the GPU Topology. I have a cluster with only one node and two GPU. How can I write its kubeshare-config.yaml. Thanks!

She-xj commented 1 year ago

Maybe I can answer some your questions. The first is that you need place kubeshare-config.yaml under /kubeshare/scheduler The second is that you can run prometheus to make sure their enpoint is up.

But I also have some questions about the GPU Topology. I have a cluster with only one node and two GPU. How can I write its kubeshare-config.yaml. Thanks!

Thanks a lot for your answer! I think your problem is in the line childCellType: "NVIDIA GeForce GTX 3090", which should be written as childCellType: "NVIDIA-GeForce-GTX-3090" About my first question, does the "kubeshare" match the whole project "KubeShare"? Or does I need to make new file named "kubeshare" in KubeShare? Thanks!

icovej commented 1 year ago

Maybe I can answer some your questions. The first is that you need place kubeshare-config.yaml under /kubeshare/scheduler The second is that you can run prometheus to make sure their enpoint is up. But I also have some questions about the GPU Topology. I have a cluster with only one node and two GPU. How can I write its kubeshare-config.yaml. Thanks!

Thanks a lot for your answer! I think your problem is in the line childCellType: "NVIDIA GeForce GTX 3090", which should be written as childCellType: "NVIDIA-GeForce-GTX-3090" About my first question, does the "kubeshare" match the whole project "KubeShare"? Or does I need to make new file named "kubeshare" in KubeShare? Thanks!

Well, after I built KubeShare, in the root directory, KubeShare was there. In it, there are logs and other files. In fact, I also don't know if I need to place the project "KubeShare" in it, but I found I need to place the config.yaml there. Even though I rewrite its path in pkg/scheduler/scheduler.go, it still didn't work. So I think you can place the whole project "KubeShare" anywhere, but you need to modify some content