-
The python `mlx_lm` implementation generates at ~101 tokens per second for `mlx-community/Phi-3-mini-4k-instruct-4bit`, whereas the swift code here generates at ~60 tokens per second.
Here is my py…
-
### Feature request
From what I understand, streaming dataset currently pulls the data, and process the data as it is requested.
This can introduce significant latency delays when data is loaded i…
-
The implementation of stop_criteria in mlx_lm.server is inherently flawed. Stop sequences only get matched when the newest tokens generated perfectly match a stop sequence. However it does not stop if…
-
When running a dataset.map with `num_proc=16`, I am unable to tokenize a ~45GB dataset on a machine with >200GB Vram. The dataset consists of ~30000 rows with a string of 120-180k characters.
The m…
-
Hello,
I am pretraining Tinyllama on Lightning AI studio on my custom dataset. I am using `prepare_starcoder.py` to convert the parquet files because my data has one folder of parquet files. After …
-
### System Info
hi,
I am unable to stream the final answer from llm chain to chianlit UI.
langchain==0.0.218
Python 3.9.16
here are the details:
https://github.com/Chainlit/chainlit/issues/3…
-
> Thanks - I need to upgrade this plugin to the latest Replicate library version and make a bunch of changes.
_Originally posted by @simonw in https://github.com/simonw/llm-replicate/issues/24#issu…
-
I would like to suggest a potential enhancement that could improve the monitoring of user activity.
Currently, the system saves each conversation in the Azure Cosmos DB, this is a great feature, bu…
-
## 🐛 Bug
The "vocab_size" in config file is 50272 but the len(tokenizer) is 50265, they not match eacch other.
### To Reproduce
Steps to reproduce the behavior (**always include the command y…
-
As part of e2e training, encountered wild loss curve spikes:
After additional hyperparam tuning and further investigation, the root cause is that we are reading the dataset sequentially, so to …