streaming-tokenizer Search Results

1000+ results
for streaming-tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

EricLBuehler/mistral.rs #630

Streamed inference not as smooth (fast?) as with e.g. Ollama…

## Describe the bug Have a look :-) https://github.com/user-attachments/assets/321dbb21-2403-4330-9ce1-091902298888 ## Latest commit or version 0.22 MBP M3 Max

ChristianWeyer updated 5 days ago
39
simonw/public-notes #12

Try smallcloudai/Refact-1_6B-fim

https://huggingface.co/smallcloudai/Refact-1_6B-fim - via https://news.ycombinator.com/item?id=37381862

simonw updated 1 year ago
13
vllm-project/vllm #1707

API causes slowdown in batch request handling

Using the API server and submitting multiple prompts to take advantage of speed benefit returns the following error: "multiple prompts in a batch is not currently supported" What's the point of …

jpeig updated 1 month ago
38
mlflow/mlflow #10847

getting error named "Segmentation fault" while evaluating th…

### Issues Policy acknowledgement - [X] I have read and agree to submit bug reports in accordance with the [issues policy](https://www.github.com/mlflow/mlflow/blob/master/ISSUE_POLICY.md) ### Where…

arunsingh2008 updated 9 months ago
8
irthomasthomas/undecidability #682

Codefuse-ChatBot: Development by Private Knowledge Augmentat…

- [ ] [codefuse-chatbot/README_en.md at main · codefuse-ai/codefuse-chatbot](https://github.com/codefuse-ai/codefuse-chatbot/blob/main/README_en.md?plain=1) # codefuse-chatbot/README_en.md at main ·…

irthomasthomas updated 2 months ago
2
vllm-project/vllm #7382

[Bug]: LLaMa 3.1 8B/70B/405B all behave poorly and different…

### Your current environment Docker latest 0.5.4 ``` docker pull vllm/vllm-openai:latest docker run -d --restart=always \ --runtime=nvidia \ --gpus '"device=0"' \ --shm-size=10.…

pseudotensor updated 2 months ago
18
ray-project/ray #47183

[Core] ray.exceptions.GetTimeoutError: Get timed out: some o…

### What happened + What you expected to happen I am trying to load a quantized large model with vLLM. It is able to start the model loading, but it sometimes will stop loading the model and return…

hxue3 updated 1 week ago
7
modelscope/ms-swift #2200

Can't Find How to Apply Inference on Fine-Tuned Qwen2-vl 7B …

I have fine-tuned the Qwen2-vl 7B model, and I am trying to perform inference but I can't figure out how to do it. The inference command used during fine-tuning is as follows: ``` NFRAMES=24 MAX_PIX…

gungorturan updated 4 weeks ago
1
predibase/lorax #151

Support custom tokenizer when loading a local model

### Feature request I have download the model, so I want to run it use local model, eht sample is: docker run --gpus all --shm-size 1g -p 8080:80 -v /data/model/:/data/ \ ghcr.io/predibase/lora…

yinjiaoyuan updated 10 months ago
8
fxamacker/cbor #440

feature: inspect first/outer "kind" without full decode

**Is your feature request related to a problem? Please describe.** A nice property of the `json.RawMessage` design is that it's fairly trivial to safely inspect the broad kind of JSON data with: …

extemporalgenome updated 11 months ago
1

上一页 1...13 14 15 16 17 18 19...100 下一页

1000+ results for streaming-tokenizer

1000+ results
for streaming-tokenizer