-
[DeepSpeed](https://github.com/microsoft/DeepSpeed) is an excellent framework for training LLMs on a large scale, while the mpi-operator is the ideal tool to facilitate this within the Kubernetes ecos…
-
**Motivation:**
Currently, when using the Transformers library in combination with DeepSpeed for training large language models like LLMs, checkpoints (e.g. `bf16_zero_pp_rank_0_mp_rank_00_optim_stat…
-
With the proliferation of models and model variants it becomes more important to track assessment dates and model versions.
So far we've been able to treat model families as one, because it rarely …
-
### Ticket Contents
## Description
Many people have musical ideas, but struggle to articulate them. Generative AI has promise to help people find a way to transcribe their musical ideas. The goal …
-
Hello,
I've been trying to qwen2 0.5B and tinyclip using the repository, but I'm running into CUDA OOM issues on the dense2dense distillation step. Im running on 4 80GB A100s, I was wondering if I …
-
The inspiration for this is:
https://github.com/irthomasthomas/undecidability/issues/934
https://github.com/AnswerDotAI/rerankers
-
I am currently fine-tuning a LLM (LLaMA) and would like to retrieve the gradients of each weight (parameter) after every gradient update. However, I notice that weights are (auto) wrapped into stuff l…
-
I deployed Qwen2.5-14B-Instruct on my local server and started llm correctly using vllm.
But when I executed the sample code,
```
from paperqa import Settings, ask
local_llm_config = dict(
…
-
Hello,
Tensor assertion error is raised if you try to train the model. It starts with the following:
```bash
0%| | 0/10 [00:00
-
# URL
- https://arxiv.org/abs/2306.09782
# Affiliations
- Kai Lv, N/A
- Yuqing Yang, N/A
- Tengxiao Liu, N/A
- Qinghui Gao, N/A
- Qipeng Guo, N/A
- Xipeng Qiu, N/A
# Abstract
- Large Lan…