-
(venv) personalinfo@MacBook-Pro-3 LongNet % python3 train.py
2024-03-05 23:56:10,524 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.
2024-03-05 23:56:17.908409: I tensorflow/core/platform/…
-
I found that the hook function will not be called when calculating MultiheadAttention module with requires_grad=False, this causes the FLOPs to be 0. No errors with requires_grad=True.
-
I am running the 124M model on a V100 GPU and it takes about 6 seconds to execute gpt2.generate(..., length=50, ...) to return a single predictions. If I set nsamples=100, batch_size=100, it returns a…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports.
…
-
dpo训练小白想请教下大家,我用llama-3-8b-instruct尝试进行dpo训练,数据是从hf上找的中文和英文的dpo数据,训练了4个epoch之后loss已经降到0.1左右,进行测试,模型效果不仅没有提升还出现各种各样的问题,甚至问dpo训练集里的都会出现重复瞎答的现象
下面是我训练的代码,不知道是不是哪里出现bug
import torch
from transfor…
-
### This issue is created to track the progress to refine `nn.MultiheadAttention` and `nn.Transformer`.
Since the release of both modules in PyTorch v1.2.0, we have received a lot of feedback from…
-
I use `train_with_template.py` with `mistralai/Mistral-7B-Instruct-v0.2`
```
torchrun --nproc_per_node=2 --master_port=20001 fastchat/train/train_with_template.py \
--model_name_or_path mistr…
-
We have been experimenting with different setups for the task of news source verification. Our first approach trained on sentence pairs from same and different source domains with cosine loss. For ver…
-
### 问题确认 Search before asking
- [X] 我已经查询[历史issue](https://github.com/PaddlePaddle/PaddleSeg/issues)(包括open与closed),没有发现相似的bug。I have searched the [open and closed issues](https://github.com/PaddlePa…
-
Hello! Thank you for your outstanding work!
However, we encountered an issue when attempting to apply the pruning method you proposed to DINO: IndexError: index 384 is out of bounds for dimension 0 …