-
**Describe the bug**
There is a misleading error when deploying models in small MIG partitions
**To Reproduce**
- Deploy TGIS in Openshift AI.
- Enable MIG (1g.5gb partitions).
- Deploy grani…
-
currently impossible to use `flash_attention` within a function that will use gradient checkpointing
minimal example to reproduce:
```py
b = 3
lq = 16
lkv = 17
h = 5
d = 19
q = jax.random.…
-
I'm trying to train mamba2 130m from scratch.
```
config = Mamba2Config(
vocab_size=len(tokenizer.vocab),
n_positions=10,
n_embd=768,
…
-
### System details
RStudio Edition : Server
RStudio Version : 1.4.1106
OS Version : unknown
R Version : unknown
### Steps to reproduce the problem
From customer fo…
-
Thank you for publishing the code. However, it seems to be incomplete. For example, there is no code/guidelines regarding encoding queries & documents, no hyperparameters (such as chunk size) are prov…
-
```
[rank0]: Traceback (most recent call last):
[rank0]: File "Pai-Megatron-Patch-0925/toolkits/model_checkpoints_convertor/qwen/hf2mcore_qwen2_dense_and_moe_gqa.py", line 924, in
[rank0]: m…
-
Hi, I am using VQGAN on the MSCOCO training dataset (also tried adding Visual Genome to construct a 1 Million dataset), but got a bad result. The pixels are wired.
Here are my settings, …
-
基于2台A800x80G训练13B LLaMA模型发现效率只能达到840 token/sec/GPU,不知道是什么原因,详细配置如下:
--tensor-model-parallel-size 4 \
--pipeline-model-parallel-size 1 \
--sequence-parallel \
--distributed-timeout…
-
Hello, I'm trying ipex-llm 2.2.0b20240927 with pytorch ipex 2.3.110+xpu, and it failed with following error:
```
ERROR azarrot.backends.common:common.py:323 An error occurred when generating te…
-
I saw your post on twitter about your new method for attention approximation and I think this is a cool idea! But can you clarify a few things?
**Approximation Method:** Is your method genuinely appr…