[X] I had searched in the issues and found no similar feature requirement.
Description
Hi, I deployed dbgpt on my own computer and loaded a local model Qwen2 0.5B, it reasoned very fast in dbgpt, I asked questions and it answered them very fast, however I'm in pycharm, and I'm getting it to work by writing code like.
It's reasoning very slowly, although I'm sure cuda is being used to speed it up.
I don't understand why this is the case, and I'd like to achieve very fast reasoning locally as well
Search before asking
Description
Hi, I deployed dbgpt on my own computer and loaded a local model Qwen2 0.5B, it reasoned very fast in dbgpt, I asked questions and it answered them very fast, however I'm in pycharm, and I'm getting it to work by writing code like.
from transformers import pipeline
messages = [ {“role”: “user”, “content”: “Who are you?”}, ] ] pipe = pipeline(“text-generation”, model=“Qwen/Qwen2-0.5B”) pipe(messages)
It's reasoning very slowly, although I'm sure cuda is being used to speed it up. I don't understand why this is the case, and I'd like to achieve very fast reasoning locally as well
Use case
No response
Related issues
No response
Feature Priority
None
Are you willing to submit PR?