SpecInfer generate '<pad>'

My machine configuration is 4*3090, and my example prompt is: please introduce Kobe Bryant, who played basketball in NBA. I use three SSMs, all of which are opt-125M. Only when the LLM uses opt-13b, the generated text looks It's normal until it gets up, as follows: 13b

When I use smaller LLMs (opt-6.7b, opt-1.3b), the generated text is all . 6 7b 1 3b

why is that?

My script is as follows: (in the directory /workspace/Flexflow/build/). The prompt.json is "please introduce Kobe Bryant, who played basketball in NBA".

./inference/spec_infer/spec_infer \
    -ll:gpu 4 \
    -ll:fsize 22000 \
    -ll:zsize 30000 \
    -llm-model /models/opt-13b/ \
    -ssm-model /models/opt-125m/ \
    -ssm-model /models/opt-125m/ \
    -ssm-model /models/opt-125m/ \
    -prompt /workspace/FlexFlow/prompts/prompt.json \
    -tensor-parallelism-degree 4 \
    --fusion > ../sclog/spec_infer.log

Thank you very much for your valuable time.

flexflow / FlexFlow

SpecInfer generate '<pad>' #1301