Mozilla-Ocho / llamafile

Distribute and run LLMs with a single file.
https://llamafile.ai
Other
18.88k stars 953 forks source link

Process request from two GUI window at the same time. #236

Open intc-hharshtk opened 7 months ago

intc-hharshtk commented 7 months ago

When I open two browser window, and enter the prompt, then the output from first window completed and then only the output in other window is generated. Is there any switch that I am missing.

Command : ./llava-v1.5-7b-q4.llamafile -ngl 9999 --server --port 8080 --parallel 5 -t 10

jart commented 7 months ago

It looks like the parallel flag regressed in a recent upstream upgrade. I'll leave this open to track progress. It'll likely be fixed on the next sync if not sooner.

realcarlos commented 5 months ago

yes, same issues , --np doesn't work , and I wonder is there any benchmark about the batch inference.

vivekjainmaiet commented 4 months ago

Is it fixed ?? Which old build have parallel processing working which we can use till we have this issue fixed.

Phate334 commented 4 months ago

Found a bug related to -np. When -np is set too large or the output content is more longer, it will be triggered, and meaningless content will be output repeatedly until the max_tokens upper limit is reached.

圖片

./llamafile --server -m model.gguf -cb -np 20 -c 4096 -ngl 999

I have tried multiple models and the same problem occurs.