awslabs / llm-hosting-container

Large Language Model Hosting Container
Apache License 2.0
74 stars 26 forks source link

Fix compute_cap access issue for TEI GPU image #78

Closed haixiw closed 3 months ago

haixiw commented 3 months ago

Issue #, if available: The whole investigation is here: https://quip-amazon.com/FKIAAcZoJQB4/TEI-image-failed-to-access-the-GPU-information-from-containerWIP

Description of changes: To sum up. Root cause is that HF's cuda driver can't access the compute_cap info from GPU. a known issue: https://github.com/huggingface/candle/issues/733

I implemented a bash script to dynamically map the GPU name with its compute_cap to resolve the issue.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.