aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.1k stars 1.14k forks source link

EETQ not available when using TGI via get_huggingface_llm_image_uri #4194

Open TRT-BradleyB opened 1 year ago

TRT-BradleyB commented 1 year ago

Describe the bug

Related to this issue: https://github.com/aws/deep-learning-containers/issues/3377

There are two versions of the TGI 1.1.0 image. One has EETQ pre-installed: https://github.com/NetEase-FuXi/EETQ

py39-cu118-ubuntu20.04 and py39-cu118-ubuntu20.04-v1.0

In the json config only the one without EETQ is specified.

https://github.com/aws/sagemaker-python-sdk/blob/bfc63d2bb91e33345651e3e00598772b7fb9f971/src/sagemaker/image_uri_config/huggingface-llm.json#L220

Easy fix, but I'm not sure how you'd like to resolve this given the naming scheme deviates.

Daan-Grashoff commented 1 year ago

They have multiple versions, but none are working with AWQ models:


    "imageDetails": [
        {
            "registryId": "763104351884",
            "repositoryName": "huggingface-pytorch-tgi-inference",
            "imageDigest": "sha256:2739b630b95d8a95e6b4665e66d8243dd43b99c4fdb865feff13aab9c1da06eb",
            "imageTags": [
                "2.0.1-gpu-py39-cu118-ubuntu20.04",
                "2.0-tgi1.1-gpu-py39-cu118-ubuntu20.04",
                "2.0-gpu-py39-cu118-ubuntu20.04-v1",
                "2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04-v1.0-2023-10-02-14-29-28",
                "2.0-tgi1.1-gpu-py39-cu118-ubuntu20.04-v1",
                "2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04",
                "2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04-v1.0"
            ],
            "imageSizeInBytes": 4576429231,
            "imagePushedAt": "2023-10-02T16:39:34+02:00",
            "imageManifestMediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "artifactMediaType": "application/vnd.docker.container.image.v1+json",
            "lastRecordedPullTime": "2023-10-16T15:46:30.296000+02:00"
        }
    ]
}```
Igosuki commented 12 months ago

Why is this still not solved ? eetq slashes inference time by a factor of 2...

amzn-choeric commented 10 months ago

I might be missing something obvious, but the two tags you listed for 1.1.0 should be pointing to the same image. Please use the latest version, which should be 1.3.3 as of this writing.

knikure commented 10 months ago

@TRT-BradleyB can you try with latest TGI image?