-
I updated Ollama from 0.1.16 to 0.1.18 and encountered the issue.
I am using python to use LLM models with Ollama and Langchain on Linux server(4 x A100 GPU).
There are 5,000 prompts to ask and get…
-
Idea by @nyh
Scylla already has a cardinality estimator (see http://www.datastax.com/dev/blog/improving-compaction-in-cassandra-with-cardinality-estimation) which estimates how many partitions the…
-
您好,我们测试了您提供的 CUDA_VISIBLE_DEVICES=0 swift infer --model_type minicpm-v-v2_6-chat --model_id_or_path openbmb/MiniCPM-V-2_6 以及 video测试代码(如下)。发现对视频的测试结果,似乎只依赖于视频第一帧。我们尝试了多次对视频OCR的提取,结果显示都只会输出第一帧的OCR结果。请问…
-
Hi
I want to fine tune "stt_en_fastconformer_hybrid_large_streaming_multi" on custom data.
In my dataset I have things like "Vitamin B12", "Code: c12r5", "hb1ac" etc
For these alphanumeric words:
…
-
I am having issue where streaming the result from ExLlamaV2DynamicJobAsync cause the stream rate to slow by half, however, when the generation reach halfway of the generation, then suddenly all the re…
-
The HF documentation says that you can now export seq2seq to ONNX with the OnnxSeq2SeqConfigWithPast class.
https://huggingface.co/docs/transformers/v4.23.1/en/main_classes/onnx#onnx-configurations
…
-
### 🐛 Describe the bug
We are facing issues with loss curves and reproducibility when using `torch.compile()` with our models. Attached below is a graph of train loss with runs with `torch.compile(…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
Out of Memory error during tokenization. tried streaming and facing same issue with **streaming:true** an…
-
## Describe the bug
I'm encountering an error while running the phi3v example using a local model. Here's my code:
```rust
use either::Either;
use indexmap::IndexMap;
use std::{path::PathBuf,…
-
I'm using `TextIteratorStreamer` for streaming output.
Since LLM may repeat its output indefinitely, I would like to be able to have LLM stop generating when it receives a request to cancel.
Is …