-
### Feature request
Hi, I'm the author of [zhuzilin/ring-flash-attention](https://github.com/zhuzilin/ring-flash-attention).
I wonder if you are interested in integrating context parallel with [zh…
-
TensorRT-LLM has great potential for allowing people to run larger models efficiently with limited hardware resources. Unfortunately, the current quantization workflow requires significant computation…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
###…
-
- [ ] async
- [x] less wasteful LLM calls
I'm cooking on the Database stuff right now, and it's clear that there's a few things we can do to make the daily run much more efficient.
The searches…
-
If GPU is available in the machine of the user. Instead of using CPU for processing the gif(s) files, using GPU would prove a much more efficient and effective solution in terms of time complexity.
…
-
-
Thank you so much for this project and your efforts to make GraphRAG accessible for the masses!
**Is your feature request related to a problem? Please describe.**
Systems with an appropriate GPU…
-
Certainly! Let's dive into a comprehensive brainstorm on how your code and project can evolve to achieve your goals. We'll explore various ideas, metrics, and improvements that could help you optimize…
-
### 🚀 The feature, motivation and pitch
there's a new DP shard strategy which is more flexible and general, see more detail at https://arxiv.org/abs/2311.00257 AMSP: Reducing Communication Overhead o…
-
### Feature request
Extract the spiking nature of the LLM and port that [set] of features over for training/inference,.
https://github.com/ridgerchu/SpikeGPT
### Motivation
the benefits would r…