Open TRT-BradleyB opened 1 year ago
They have multiple versions, but none are working with AWQ models:
"imageDetails": [
{
"registryId": "763104351884",
"repositoryName": "huggingface-pytorch-tgi-inference",
"imageDigest": "sha256:2739b630b95d8a95e6b4665e66d8243dd43b99c4fdb865feff13aab9c1da06eb",
"imageTags": [
"2.0.1-gpu-py39-cu118-ubuntu20.04",
"2.0-tgi1.1-gpu-py39-cu118-ubuntu20.04",
"2.0-gpu-py39-cu118-ubuntu20.04-v1",
"2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04-v1.0-2023-10-02-14-29-28",
"2.0-tgi1.1-gpu-py39-cu118-ubuntu20.04-v1",
"2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04",
"2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04-v1.0"
],
"imageSizeInBytes": 4576429231,
"imagePushedAt": "2023-10-02T16:39:34+02:00",
"imageManifestMediaType": "application/vnd.docker.distribution.manifest.v2+json",
"artifactMediaType": "application/vnd.docker.container.image.v1+json",
"lastRecordedPullTime": "2023-10-16T15:46:30.296000+02:00"
}
]
}```
Why is this still not solved ? eetq slashes inference time by a factor of 2...
I might be missing something obvious, but the two tags you listed for 1.1.0 should be pointing to the same image. Please use the latest version, which should be 1.3.3 as of this writing.
@TRT-BradleyB can you try with latest TGI image?
Describe the bug
Related to this issue: https://github.com/aws/deep-learning-containers/issues/3377
There are two versions of the TGI 1.1.0 image. One has EETQ pre-installed: https://github.com/NetEase-FuXi/EETQ
py39-cu118-ubuntu20.04 and py39-cu118-ubuntu20.04-v1.0
In the json config only the one without EETQ is specified.
https://github.com/aws/sagemaker-python-sdk/blob/bfc63d2bb91e33345651e3e00598772b7fb9f971/src/sagemaker/image_uri_config/huggingface-llm.json#L220
Easy fix, but I'm not sure how you'd like to resolve this given the naming scheme deviates.