streaming-tokenizer Search Results

1000+ results
for streaming-tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ml-explore/mlx-swift-examples #93

Swift generates tokens substantially slower than python for …

The python `mlx_lm` implementation generates at ~101 tokens per second for `mlx-community/Phi-3-mini-4k-instruct-4bit`, whereas the swift code here generates at ~60 tokens per second. Here is my py…

neilmehta24 updated 3 weeks ago
4
huggingface/datasets #6120

Lookahead streaming support?

### Feature request From what I understand, streaming dataset currently pulls the data, and process the data as it is requested. This can introduce significant latency delays when data is loaded i…

PicoCreator updated 11 months ago
1
ml-explore/mlx-examples #524

Using token sequences as stop criteria does not work in mlx_…

The implementation of stop_criteria in mlx_lm.server is inherently flawed. Stop sequences only get matched when the newest tokens generated perfectly match a stop sequence. However it does not stop if…

tidely updated 3 months ago
7
huggingface/tokenizers #1495

LLamaTokenizer with `use_fast=True` / and `use_fast=False` …

When running a dataset.map with `num_proc=16`, I am unable to tokenize a ~45GB dataset on a machine with >200GB Vram. The dataset consists of ~30000 rows with a string of 120-180k characters. The m…

michaelfeil updated 1 week ago
10
Lightning-AI/litgpt #1117

Error in "_merge_no_wait": The config isn't consistent betwe…

Hello, I am pretraining Tinyllama on Lightning AI studio on my custom dataset. I am using `prepare_starcoder.py` to convert the parquet files because my data has one folder of parquet files. After …

eljanmahammadli updated 3 months ago
10
langchain-ai/langchain #10316

Final answer streaming problem

### System Info hi, I am unable to stream the final answer from llm chain to chianlit UI. langchain==0.0.218 Python 3.9.16 here are the details: https://github.com/Chainlit/chainlit/issues/3…

poojitharamachandra updated 2 months ago
21
simonw/llm-replicate #25

Upgrade to latest Replicate client library, get Llama 3 work…

> Thanks - I need to upgrade this plugin to the latest Replicate library version and make a bunch of changes. _Originally posted by @simonw in https://github.com/simonw/llm-replicate/issues/24#issu…

simonw updated 3 months ago
8
microsoft/sample-app-aoai-chatGPT #376

Monitor usage per user - Include the token count in the data…

I would like to suggest a potential enhancement that could improve the monitoring of user activity. Currently, the system saves each conversation in the Azure Cosmos DB, this is a great feature, bu…

NorEliYehShi updated 7 months ago
2
facebookresearch/metaseq #469

why the "vocab_size" in config file is 50272 but the len(tok…

## 🐛 Bug The "vocab_size" in config file is 50272 but the len(tokenizer) is 50265, they not match eacch other. ### To Reproduce Steps to reproduce the behavior (**always include the command y…

Zcchill updated 1 year ago
4
pytorch/torchtitan #128

Loss curve spikes on amalagamated datasets - need full scale…

As part of e2e training, encountered wild loss curve spikes: After additional hyperparam tuning and further investigation, the root cause is that we are reading the dataset sequentially, so to …

lessw2020 updated 2 months ago
5

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for streaming-tokenizer

1000+ results
for streaming-tokenizer