-
I want to perform inference on quantized LLAMA (W8A16) on ARM-v9 (with SVE) using oneDNN. The LLAMA weights are per-group quantized.
Based on my understanding, I need to prepack the weights to redu…
-
In many near day ago, I’m assigned to a new research paper about Stable Diffusion and LLM, so I don’t have much time for updating new feature, but now the model is came to training process, so I come …
-
### 🚀 The feature, motivation and pitch
[vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference and serving engine for LLMs. We would like to use `torch.compi…
-
- Functorch = memory blowup due to `vmap`
- Asdl/asdfghjkl = can't backprop through the Jacobians => can't be used for continuous BO
- BackPACK = requires inflexible extension
We need a Jacobian …
-
**What problem or use case are you trying to solve?**
We are trying to reduce the costs associated with using Large Language Models (LLMs) in the OpenDevin project. This involves optimizing the usa…
-
in the training process, I found that if I set batch size > 1, the loss sometimes will be nan
```
tensor(nan, device='cuda:0', grad_fn=)
```
and some logits also nan
```
(Pdb) p llm_out.logits…
-
### Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md)…
-
## Better integration of LLM kernel and OS kernel
- [ ] Translate current implementation to more efficient implementation (more efficient and still cross-platform)
- [ ] Multi-thread/Multi-proce…
-
# URL
- https://arxiv.org/abs/2402.13598
# Affiliations
- Lin Ning, N/A
- Luyang Liu, N/A
- Jiaxing Wu, N/A
- Neo Wu, N/A
- Devora Berlowitz, N/A
- Sushant Prakash, N/A
- Bradley Green, …
-
This document outlines the long-term features in the AIOS roadmap for Q3 2024. Feel free to discuss any of the following topics, and add any other topics you'd like to talk about in this issue.
## …