streaming-tokenizer Search Results

1000+ results
for streaming-tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

jschuur/chatgpt-repl #46

Feature: Calculate usage info for streamed responses

Streamed responses [don't include usage info in the response](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb). Would have to calculate this via [tiktoken]…

jschuur updated 1 month ago
2
erfanzar/EasyDeL #166

TPU v4-32 set-up not working

**Describe the bug** Can't train with multiple VM's; TPU v-4-32 It stops after loading the model, won't even load the data Been trying for two days, maybe my set-up is wrong. Really want to know w…

s-smits updated 2 weeks ago
12
NVIDIA/TensorRT-LLM #1832

Timeline for adding IFB support to more models？

I noticed that currently only a few series of models, including **Qwen, ChatGLM, and GPT**, support **IFB**. The lack of support for other models has severely impacted the practicality of the TRT-LLM …

AndyZZt updated 3 weeks ago
6
mit-han-lab/streaming-llm #27

How to generate longer token streams?

Have everything running on python3.10 under ubuntu 22.04 with 2x 24 gig gpus. Tested original and revised versions of 'mt_bench.jsonl' and output is good with a 70b 4bit gptq model. Trying to incr…

GenTxt updated 9 months ago
3
intel-analytics/ipex-llm #11464

Generate result token by token when inference.

For example: https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/PyTorch-Models/Model/qwen1.5/generate.py The current inference output is generated all at once. However, t…

liang1wang updated 4 weeks ago
3
huggingface/transformers #30487

add `stream` to pipeline parameters

### Feature request add option to stream output from pipeline ### Motivation using `tokenizer.apply_chat_template` then other stuff then `model.generate` is pretty repetitive and I think it's time …

not-lain updated 2 months ago
9
dusty-nv/jetson-containers #450

RIVA doesn't seem to work with DP6 on Orin AGX

Poking around I see that Riva says it is only supported up to 5.1, but there are examples in these containers using it and these containers all work with DP6 so I've been trying to no avail, including…

jasonthenderson updated 4 months ago
8
vllm-project/vllm #5306

GLM-4-9B-Chat:

### The model to consider. https://huggingface.co/THUDM/glm-4-9b-chat ### The closest model vllm already supports. chatglm ### What's your difficulty of supporting the model you want? _No respons…

Geaming-CHN updated 1 month ago
16
intel/intel-npu-acceleration-library #85

Does this library support Qwen/Qwen2-7B-Instruct？

When I tested Qwen2-7B on this library, it reported some errors. ```python from transformers import AutoModelForCausalLM, AutoTokenizer from intel_npu_acceleration_library import NPUModelForCausalL…

qwebug updated 1 month ago
4
InternLM/lmdeploy #2142

[Bug] relatively slow speed after deploy InternVL2-26B

### Checklist - [X] 1. I have searched related issues but cannot get the expected help. - [ ] 2. The bug has not been fixed in the latest version. - [X] 3. Please note that if the bug-related issue y…

LIMr1209 updated 5 days ago
12

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for streaming-tokenizer

1000+ results
for streaming-tokenizer