Project-HAMi / HAMi

Heterogeneous AI Computing Virtualization Middleware
http://project-hami.io/
Apache License 2.0
729 stars 162 forks source link

Is there a way to reduce the HAMI-Core verbosity level for workloads? #544

Open 4gt-104 opened 2 days ago

4gt-104 commented 2 days ago

Please provide an in-depth description of the question you have:

I reviewed the HAMI-Core and confirmed that the verbosity level can be reduced by setting the LIBCUDA_LOG_LEVEL environment variable. However, configuring this for every GPU pod can be tedious.

Is there a way to set the verbosity level through HAMI’s Helm chart or scheduler configuration instead?

What do you think about this question?: I believe the user should have easy access to configure this parameter, and it could be integrated with the already existing admission webhook. Additionally, I recommend setting the default HAMI-Core verbosity level to 0, ensuring consistent behavior with Nvidia’s device-plugin.

Environment:

wawa0210 commented 2 days ago

There is no good solution at the moment.

If HAMi can try to read global configuration information through webhook, set this parameter. Not sure if it is feasible, need to try

archlitchi commented 1 day ago

you can modify mutatingWebhookConfiguration in HAMi, add env LIBCUDA_LOG_LEVEL=0 to GPU pods, by the way ,do you have a WeChat or Linkedin account?

4gt-104 commented 1 day ago

@archlitchi thanks for the reply, I will try to implement setting LIBCUDA_LOG_LEVEL during admission. Unfortunately I don't have WeChat but I have a linkedin account.

4gt-104 commented 11 hours ago

I have reviewed the code and believe it can be easily implemented, but I have a concern regarding ArgoCD and GitOps. Overriding the pod spec, whether it's to modify the environment variable for visible CUDA devices or any other environment variable, would likely trigger an out-of-sync state.

@archlitchi what do you think?