local code generate slowly

Search before asking

[X] I had searched in the issues and found no similar feature requirement.

Description

Hi, I deployed dbgpt on my own computer and loaded a local model Qwen2 0.5B, it reasoned very fast in dbgpt, I asked questions and it answered them very fast, however I'm in pycharm, and I'm getting it to work by writing code like.

from transformers import pipeline

messages = [ {“role”: “user”, “content”: “Who are you?”}, ] ] pipe = pipeline(“text-generation”, model=“Qwen/Qwen2-0.5B”) pipe(messages)

It's reasoning very slowly, although I'm sure cuda is being used to speed it up. I don't understand why this is the case, and I'd like to achieve very fast reasoning locally as well

Use case

No response

Related issues

No response

Feature Priority

None

Are you willing to submit PR?

[X] Yes I am willing to submit a PR!

eosphoros-ai / DB-GPT