streaming-tokenizer Search Results

facebookresearch/seamless_communication #490

seamless_streaming_unity相关权重自行下载，怎么设置路径

在seamless_streaming_unity.yaml配置文件中，修改了char_tokenizer: 和checkpoint:参数，改成了我下载好的权重路径，为什么推理运行还要下载权重呢？

sunyclj updated 2 days ago

Lightning-AI/litdata #179

Batch size beginning to vary half way through epoch

## 🐛 Bug Hello, I'm running into an issue where my batch size begins to vary half way through an epoch. ### To Reproduce I logged when it deviated from 64. It happens in all epochs, and when trai…

MarcoForte updated 1 hour ago

triton-inference-server/tensorrtllm_backend #413

the result use inflight_batcher_llm_client to send multiple …

case1：use tensorrtllm python3 /tensorrtllm_backend/tensorrt_llm/examples/run.py --engine_dir "/data512/tensorrtllm_backend/triton_model_repo/tensorrt_llm/1/" \ --max_output_len 2048 \ …

stifles updated 2 months ago

ml-explore/mlx-examples #808

SPMStreamingDetokenizer sometimes outputs incorrect multi-by…

Using the microsoft/Phi-3-medium-128k-instruct model, I received incorrect responses for multi-byte characters (commonly seen in Japanese or Chinese), as shown below: ``` mlx_lm.generate --model mic…

wil24 updated 1 month ago

mudler/LocalAI #2521

Transformers backend supports mps

**LocalAI version:2.16.0 **Environment, CPU architecture, OS, and Version:** mac studio M2 Ultra **Describe the bug** using backend transformers for glm4, trust_remote_code: true not c…

aotsukiqx updated 2 weeks ago

huggingface/trl #1741

TypeError: IterableDataset.map() got an unexpected keyword a…

I encountered a `TypeError` when using streaming datasets, `num_proc` does not exist in `IterableDataset.map()`. Error logs: ``` ------------------------------------------------------------------…

mrbesher updated 2 weeks ago

ml-explore/mlx-examples #745

Model doesn't know when to stop generating.

I am relatively new to running inference on my own. Previously, I used ollama, but recently I decided to try out mlx since I have an M3 with sufficient unified memory and I was curious about how it co…

ahmetkca updated 2 months ago

huggingface/tokenizers #1546

"Solution" to memory hogging in train_new_from_iterator with…

Hi So I was training a new tokenizer from Llama Tokenizer (meta-llama/Llama-2-7b-hf), on a medium sized corpus (Fineweb-10BT sample : 15 million documents with average length of 2300 characters). A…

morphpiece updated 3 weeks ago

Lightning-AI/litgpt #1450

Using custom data for `Continue pretraining an LLM`

The example (https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#continue-pretraining-an-llm) works fine on my machine but as soon as i replace with custom text files that each just contain one …

SimiPixel updated 1 month ago

triton-inference-server/client #682

Incomplete installation of all genai-perf dependencies preve…

When `genai-perf` is installed using `pip` from Github (as documented), on first run it tries to download several files from Huggingface, like this: ``` $ docker run --rm -it --name test -u 0 gpu-tr…

mirekphd updated 1 week ago

1000+ results for streaming-tokenizer

1000+ results
for streaming-tokenizer