streaming-tokenizer Search Results

1000+ results
for streaming-tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #2022

Different output with transformers lib and tensorrt llm when…

### System Info A100 ### Who can help? @juney-nvidia @ncomly-nvidia @kaiyux @byshiue ### Information - [ ] The official example scripts - [X] My own modified scripts ### Tasks - [ ] An offic…

Alireza3242 updated 1 week ago
2
ohler55/ojg #179

Extract data out of large JSON

I am building a tool which would extract data from a potentially large JSON. If data is ndjson, then it is easy to read it line by line and extract data from each separate object. But if data is in a …

mitar updated 3 hours ago
4
facebookresearch/seamless_communication #490

seamless_streaming_unity相关权重自行下载，怎么设置路径

在seamless_streaming_unity.yaml配置文件中，修改了char_tokenizer: 和checkpoint:参数，改成了我下载好的权重路径，为什么推理运行还要下载权重呢？

sunyclj updated 1 day ago
4
microsoft/autogen #3264

[Issue]: how to deploy the local model correctly and run the…

### Describe the issue I use the [Local-LLMs/](https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs/) to deploy my local model but the result by llm is weird ### Steps to reproduce ## lu…

lambda7xx updated 2 days ago
5
xhluca/bm25s #31

On-the-fly stemming

Right now, stemming is done after the strings are split and converted to IDs: https://github.com/xhluca/bm25s/blob/73c7dea9ea7f88a23a7fa9a94e9a7bca48669f1c/bm25s/tokenization.py#L152-L177 Howeve…

xhluca updated 2 weeks ago
1
huggingface/tokenizers #1572

BPE Split pretokenization rule is not reflected in the vocab…

Training a BPE tokenizer from scratch, I am using Split pretokenization. In the below example, I split on each digit so that numbers are represented by the sequences of digits they are made of. ```…

meliksahturker updated 1 week ago
2
guardrails-ai/guardrails #829

[feat] Distribute NLTK tokenizers used in the core package

**Description** [Add a description of the feature] Since we now required `nltk` and the `punkt` tokenizer during the validation loop for chunking during streaming, we should either download and dist…

CalebCourier updated 2 weeks ago
1
xenova/transformers.js #853

Result is wrong when decoding tokens one by one

### System Info Node.js 22.4.0 @xenova/transformers 2.17.2 ### Environment/Platform - [ ] Website/web-app - [ ] Browser extension - [X] Server-side (e.g., Node.js, Deno, Bun) - [ ] Desktop app (e.…

zcbenz updated 2 weeks ago
1
mistralai/mistral-inference #207

[BUG: config.json in mamba-codestral-7B-v0.1 is error

### Python -VV ```shell Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] ``` ### Pip Freeze ```shell accelerate==0.33.0 addict==2.4.0 annotated-types==0.7.0 apex @ file:///data2/apex …

Fly-Pluche updated 1 day ago
1
triton-inference-server/tensorrtllm_backend #413

the result use inflight_batcher_llm_client to send multiple …

case1：use tensorrtllm python3 /tensorrtllm_backend/tensorrt_llm/examples/run.py --engine_dir "/data512/tensorrtllm_backend/triton_model_repo/tensorrt_llm/1/" \ --max_output_len 2048 \ …

stifles updated 2 months ago
3

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for streaming-tokenizer

1000+ results
for streaming-tokenizer