-
### What is the issue?
I am experiencing slow model loading speeds when using Ollama on my macOS system. Here are the specifications of my setup:
macOS Version: 14.5
Processor: M3 Max
Memory: 12…
ghost updated
1 month ago
-
- [ ] async
- [x] less wasteful LLM calls
I'm cooking on the Database stuff right now, and it's clear that there's a few things we can do to make the daily run much more efficient.
The searches…
-
### 🚀 The feature, motivation and pitch
there's a new DP shard strategy which is more flexible and general, see more detail at https://arxiv.org/abs/2311.00257 AMSP: Reducing Communication Overhead o…
-
1. onnx == 1.16.1 in the requirements.txt
2. AUTO plug should be disabled
3. Refactor all configurations to chatbot.config
-
-
efficient prompt design to minimize hallucinations and potentially fabricated data from LLMs
how to track that the data is not fabricated?
2) assess performance of prompt
https://help.openai.com/…
-
- [ ] [LoRA Land: Fine-Tuned Open-Source LLMs that Outperform GPT-4 - Predibase - Predibase](https://predibase.com/blog/lora-land-fine-tuned-open-source-llms-that-outperform-gpt-4)
# LoRA Land: Fine…
-
# Title of the Talk: No Code SLM Finetuning with MonsterAPI
## Abstract of the Talk:
Dive into the world of no-code large language model (LLM) finetuning in this informative talk presented by Mons…
-
### 🚀 The feature, motivation and pitch
Hi all, I was wondering if it's possible to do precise model device placement. For example, I would like to place the vLLM model on GPU 1 and let GPU 0 do othe…
-
**What problem or use case are you trying to solve?**
For example: 'Hey AI, what is the least mature module in this codebase?', or 'Where is the documentation lacking? Fill in some gaps.'
SWE-Agen…