amazon-science / chronos-forecasting

Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting
https://arxiv.org/abs/2403.07815
Apache License 2.0
2.02k stars 238 forks source link

Inference speed worse on AMD CPU than on Intel CPU #83

Closed CrazyChildren closed 2 weeks ago

CrazyChildren commented 1 month ago

i test chronos with intel core cpu(mac pro), linux with intel cpu(server), and linux with amd(server) on same code. it seems amd cpu has ~30x worse in inference time.

in intel cpu it approximate cost 0.7s with batch_num = 1, predict_len = 1, context_len = 70. however in AMD, it about 30s.

i don't know it's my specific case. but i found some one said turn on AMP in AMD CPU by using auto_cast to bfloat16 would case decresing performance. Bfloat16 CPU inference speed is too slow on AMD cpu

i'm quite a newbie in torch. so if someone find a solution, please post here. thx

abdulfatir commented 1 month ago

@CrazyChildren one quick check to verify if this is indeed due to bf16 (which is the likely case) is to load the model in fp32. Here's the relevant code:

import pandas as pd  # requires: pip install pandas
import torch
from chronos import ChronosPipeline

pipeline = ChronosPipeline.from_pretrained(
    "amazon/chronos-t5-small",
    device_map="cuda",  # use "cpu" for CPU inference and "mps" for Apple Silicon
    torch_dtype=torch.float32,
)