-
### 🐛 Describe the bug
# the bug description
When I use the tensor shape like this (batch_size, seq_len, embedding_size) and i put the batch_size, seq_len in dynamic_axes to generate onnx model, I c…
-
Hello Facebook Research Team,
I am exploring the DiT as implemented in your repository and came across the weight initialization strategy for the FinalLayer, particularly observed in [this section …
-
## 🐞Describing the bug
This is a follow up on https://github.com/apple/coremltools/issues/2275. Sorry I couldn't find the reopen option in the original issue.
To clarify, the issue didn't happen wit…
-
### System Info
- `transformers` version: 4.32.0.dev0
- Platform: Linux-5.4.0-135-generic-x86_64-with-glibc2.35
- Python version: 3.11.4
- Huggingface_hub version: 0.16.4
- Safetensors version: 0…
-
File "......./attention-OCR-master/src/model/seq2seq.py", line 75, in
linear = rnn_cell._linear # pylint: disable=protected-access
-
I found with original training workflow, the loss is not decling, I am not sure this is because I am using a subset of the training set.
```
# File modified by authors of InstructDiffusion from …
-
Hey everyone, I'm trying to understand the IP adapter better. Maybe someone can help me:)
Paper:
https://arxiv.org/pdf/2308.06721.pdf
Would it be right to say:
1)An IP adapter model(e.g. i…
-
**Describe the bug**
I was training to run sft based on Mixtral-8x7B-instruct model with tensor parallel size=4 (sequence parallel=True) and LoRA (target modules =[all]).
It reports that the output …
-
Nan losses when training:
![image](https://github.com/user-attachments/assets/78126797-27e6-433c-91bb-cf8260302e6c)
Please take a look at this code:
```
!pip install jax[tpu]==0.4.28 -f https:…
-
微调命令:torchrun --nnodes=1 --nproc_per_node=1 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:12345 finetune_cpm_bee.py --use-delta --model-config config/cpm-bee-10b.json --dataset ../tutorial…