AI-Hypercomputer / jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
Apache License 2.0
41 stars 15 forks source link

Add model warmup flag into cli #197

Closed vivianrwu closed 3 weeks ago

vivianrwu commented 3 weeks ago

Adds --enable_model_warmup flag per https://github.com/AI-Hypercomputer/jetstream-pytorch/pull/187

args:
        - --model_id=google/gemma-7b-it
        - --override_batch_size=32
        - --enable_model_warmup=True
I1106 18:30:12.872196 138308608845376 warmup_utils.py:108] ---------Prefill engine 0 compiled for prefill length 512.---------
2024-11-06 18:30:12,872 - root - INFO - ---------Prefill engine 0 compiled for prefill length 512.---------
2024-11-06 18:30:13,011 - root - INFO - ---------Prefill engine 0 compiled for prefill length 64.---------
I1106 18:30:13.011122 138310185887296 warmup_utils.py:108] ---------Prefill engine 0 compiled for prefill length 64.---------
I1106 18:30:16.064429 138308617238080 warmup_utils.py:108] ---------Prefill engine 0 compiled for prefill length 256.---------
2024-11-06 18:30:16,064 - root - INFO - ---------Prefill engine 0 compiled for prefill length 256.---------
...
2024-11-06 18:30:32,060 - root - INFO - ---------Generate params 0 loaded.---------
curl --request POST --header "Content-type: application/json" -s localhost:8000/generate --data '{
    "prompt": "What are the top 5 programming languages",
    "max_tokens": 200
}'

{
    "response": " for for data science in 2023?\n\n1. Python\n2. R\n3. SQL\n4. Java\n5. Scala\n\n**Note:** The order is based on popularity and demand in the data science industry in 2023."
}