-
Thank you for contributing such an excellent work.
I notice that bloomz-* outperform bloom-* via instruct tuning, I want to build a new bloomz-* model upon bloom model, (e.g. bloom-1b7-> bloomz-1b7-…
-
### System Info
- `transformers` version: 4.27.1
- Platform: Linux-4.18.0-240.el8.x86_64-x86_64-with-glibc2.2.5
- Python version: 3.8.12
- Huggingface_hub version: 0.13.3
- PyTorch version (GPU…
-
### 🐛 Describe the bug
use default rm_static dataset,
set train_data to 75000
pretrain_model: bloomz-1b1
batch_size: 8
max_epochs: 4
max_len: 256
machine: 2 v100 32g
loss_fn: log_sig
after…
-
### System Info
Hi,
We suffer inconsistent results when running model 'bloom' under different dtypes(float16, float32)
Is that a bug?
Environment:
- `transformers` version: 4.29.1
- Platform: …
-
int4 报错:RuntimeError: self and mat2 must have the same dtype
训练参数:
CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
--model_name_or_path /models/bloomz-7b1-mt \
--do_train \
--dataset…
-
### Describe the bug
Please provide a clear and concise description of what the bug is. If applicable, add screenshots to help explain your problem, especially for visualization related problems.
-
A large part of making the assistant is to teach it to follow instructions. While training using RLHF seems like the main ingredient, there are already prepared supervised instruction-following datase…
-
**Is your feature request related to a problem? Please describe.**
I often want to see the model size (the number of parameters). Often this is documented on the model card (or even the model name), …
-
Hi, I tried to serve GPT-J with huggingface repo id, it works as follows:
```
-
训练参数:
CUDA_VISIBLE_DEVICES=0 python src/train_sft.py --model_name_or_path ./Bloom/ --do_train --dataset alpaca_gpt4_en --finetuning_type lora --checkpoint_dir path_to_pt_checkpoint…