Closed xarion closed 2 years ago
Hi @xarion,
Supporting more complete kubernetes deployment options is indeed in our TODOs. I would like you know that currently we already allow the user to add custom volumes (see the k8s_volumes
arguments in Session.init) and set the cpu/mem requests/limits (see graphscope.set_option).
We are planning to support a user-provided dict as arguments and merging the dict into our builtin deployment settings before sending to kubernetes. Such a feature will be hopefully released be the end of July.
Hope the information above could be helpful for you folks.
Thanks for the response @sighingnow. Unfortunately, although graphscope.set_option allows us to set the cpu/mem limits, those values are not used while creating the kubernetes configuration. Only the requested cpu/mem values are reflected to the config.
Also, as you mentioned, k8s_volumes allows us to mount drives, but it is in a limited way. We have been unsuccessful in our attempts to use this setting to increase the ephemeral storage. One way would be to attach a mount to the log folder. But unfortunately this is also not enough, because to my knowledge k8s_volumes are not attached on the Jupyter and coordinator pods. Even if it worked, this is a complex way of dealing with the kubernetes configuration.
I suggest instead of passing parameters to a dict, let us use the kubernetes way and define a yaml file.
Hey @sighingnow.
Unfortunately, although graphscope.set_option allows us to set the cpu/mem limits, those values are not used while creating the kubernetes configuration. Only the requested cpu/mem values are reflected to the config.
This is still causing a major headache for us.
Hi @xarion,
I think the k8s_volumes
should be enough to set require storage and k8s_engine_cpu
, k8s_engine_mem
, etc. should be enough to set the cpu/mem limit.
assigning pods to nodes using labels is not possible.
We could add options like k8s_engine_pod_label={}
, k8s_vineyard_pod_label={}
to support that. Do you think such options enough for your cases? It they could work, we can try to include the implementation in the upcoming v0.17.0 release.
let us use the kubernetes way and define a yaml file.
That won't happen before v0.17.0, we need to investigate the schema the yaml file to define which options are customizable and which are not.
Hi @sighingnow
I think the k8s_volumes should be enough to set require storage and k8s_engine_cpu, k8s_engine_mem, etc. should be enough to set the cpu/mem limit.
Currently graphscope sets the "requests" parameter in kubernetes. This does not necessarily block the resources for the pod. So multiple tasks that request the max memory can be scheduled to the same node. Unfortunately, we are having this issue and we have hacky solutions to prevent this. To correctly fix this, the "limits" parameter should be set.
This does not necessarily block the resources for the pod.
The session (as well as graphscope.set_option) has a preemptive
(see https://github.com/alibaba/GraphScope/blob/main/python/graphscope/config.py#L86). When it is set as False
, the arguments k8s_engine_cpu
, k8s_engine_mem
, etc. will be use as "required", rather than "limit".
Currently graphscope sets the "requests" parameter in kubernetes. ..... To correctly fix this, the "limits" parameter should be set.
I guess you mean "requests" is what you want, but we currently set them as "limit".
You're right, I confused limits
and requests
in my post.
preemptive=false
resolved our headache. Now we can better schedule pods.
For the future, deploying to our existing systems will require a more elaborate kubernetes configuration. So we are looking forward to the release of 0.17.0!
For the future, deploying to our existing systems will require a more elaborate kubernetes configuration.
FYI: node selector has been added in https://github.com/alibaba/GraphScope/pull/2087
Closing as problems raise in this issue has been resolved. Feel free to open new tickets if you have other feature requests.
Is your feature request related to a problem? Please describe. Currently pod configurations are created by a script. So we do not have control over these configurations. This is making it difficult to customize the graphscope deployment. For example, adding a customized mount point, setting ephemeral storage, setting CPU limits to guide the scheduler, or assigning pods to nodes using labels is not possible.
Describe the solution you'd like Being able to configure the pods however we like.
Describe alternatives you've considered Currently we are setting k8s defaults for some of the critical issues (ephemeral storage). But it's not a valid solution for all of the problems.
Additional context