streaming-tokenizer Search Results

1000+ results
for streaming-tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

young-geng/EasyLM #103

Serving errors: deprecated dependencies and structure error

When I try to serve LLaMA with `v3_8` TPU as suggested in example script, there were some errors. Environment * TPU: `v3-8` * Software: `tpu-vm-base` **Command** ``` $ git clone https:…

sjw8793 updated 8 months ago
2
EricLBuehler/mistral.rs #630

Streamed inference not as smooth (fast?) as with e.g. Ollama…

## Describe the bug Have a look :-) https://github.com/user-attachments/assets/321dbb21-2403-4330-9ce1-091902298888 ## Latest commit or version 0.22 MBP M3 Max

ChristianWeyer updated 3 days ago
7
rapidsai/cudf #6089

[FEA] Support packing to a max input sequence length with cu…

**Is your feature request related to a problem? Please describe.** Currently, the tokenized string is shorter than max_length, output is be padded with 0s. So if `max( tokenized string lengths)` <…

VibhuJawa updated 5 months ago
1
tloen/alpaca-lora #51

Stream tokens output

Is it possible to stream each token of the output as soon as it is generated by the model? I guess it depends on the hugging face transformers classes and methods. Any solution to this?

marcoripa96 updated 1 year ago
15
unslothai/unsloth #419

how to add eos token

how to add eos token

gamercoder153 updated 2 months ago
28
huggingface/datasets #3829

[📄 Docs] Create a `datasets` performance guide.

## Brief Overview Downloading, saving, and preprocessing large datasets from the `datasets` library can often result in [performance bottlenecks](https://github.com/huggingface/datasets/issues/3735).…

dynamicwebpaige updated 2 years ago
1
NVIDIA/TensorRT-LLM #659

An error occurs when using streaming=True for inference.

I deployed the converted starcoder model to Triton with a world size of 2 and enabled streaming inference with streaming=True. However, I encountered an issue where the rank 1 model is unable to retri…

viningz updated 6 months ago
2
aleksanderhan/turbo-genius #6

please solve the problem in code

please solve the problem in code import torch import uvicorn import gc import asyncio import argparse import io from fastapi import FastAPI, WebSocket, Depends from fastapi.responses …

saifmodan updated 1 month ago
3
NVIDIA/TensorRT-LLM #1510

Llama 2 Execution Bug

### System Info CPU: x86_64, memory: 1024GB, GPU: 8*A6000 48GB each, Tensorrt-LLM version 0.9.0.DEV20240226. NVIDIA-Driver Version: 535.171.04 CUDA Version: 12.2; OS - Ubuntu 22.04 ### Who can hel…

Hudayday updated 3 months ago
4
imoneoi/openchat #141

Minimal Hugginface Example for OpenChat

Hi, I am interested in evaluating OpenChat (https://github.com/evalplus/evalplus/issues/60, https://github.com/evalplus/evalplus/issues/61) and want to understand what could be a minimal and self-cont…

ganler updated 6 months ago
1

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for streaming-tokenizer

1000+ results
for streaming-tokenizer