Closed haixiw closed 3 months ago
Issue #, if available: The whole investigation is here: https://quip-amazon.com/FKIAAcZoJQB4/TEI-image-failed-to-access-the-GPU-information-from-containerWIP
Description of changes: To sum up. Root cause is that HF's cuda driver can't access the compute_cap info from GPU. a known issue: https://github.com/huggingface/candle/issues/733
I implemented a bash script to dynamically map the GPU name with its compute_cap to resolve the issue.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Issue #, if available: The whole investigation is here: https://quip-amazon.com/FKIAAcZoJQB4/TEI-image-failed-to-access-the-GPU-information-from-containerWIP
Description of changes: To sum up. Root cause is that HF's cuda driver can't access the compute_cap info from GPU. a known issue: https://github.com/huggingface/candle/issues/733
I implemented a bash script to dynamically map the GPU name with its compute_cap to resolve the issue.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.