-
Perform summarisation (abstractive) using baseline models (encoder decoder and decoder only) from hugging face and then note the learnings and behaviour.
Different techniques :
1. 0 shot learning…
-
@OH-ThatGuy @gylertaydos Part of your milestone for Monday:
Let's look up some resources:
* passages of dragon (with or without English translation)
* resources on language models that train for la…
-
Hello authors,
Your experiment results on harmfulness classification:`https://github.com/andyzoujm/representation-engineering/blob/main/examples/harmless_harmful/harmless_llama2.ipynb` shows that Lla…
-
I run this script
`deepspeed --num_gpus 1 bloom-inference-scripts/bloom-ds-inference.py --name bigscience/bloomz-7b1 --batch_size 8`
and it gets stuck just like in the picture.
Log:
```
(ba…
-
What is the `fsdp_transformer_layer_cls_to_wrap` for bloom?
When I tried to fine tune with bloomz-7b1, the training stuck on 0%. As you said in the readme, it's most likely because I dont set the r…
-
### 🐛 Describe the bug
训练中显存不稳定,一直在慢慢增长
tesla V100 bloomz-1b1 32G显存会爆,虽然开始能跑,跑久了就不行了
### Environment
_No response_
-
Just a curious question I suppose!
GPTQ 4bit - https://github.com/qwopqwop200/GPTQ-for-LLaMa
Suppose someone eventually finetunes 175B OPT model, with loras or regular finetunng. or perhaps the BLOO…
-
I try to start a large version of the model using docker:
`docker run -p 10249:80 -e RUST_BACKTRACE=full -e FLASH_ATTENTION=1 -e CUDA_VISIBLE_DEVICES=4,7 --privileged --security-opt="seccomp=unconf…
-
Can MeZo be used on NLG tasks? I integrated the _inner_training_loop part of the code and the methods it relies on into the NLG task training code, and performed fine-tuning training on bloom (bloomz-…
-
hello. your work is great :+1:
I wrapped your binary under my bot/API project https://github.com/laurentperez/ava#what-models-or-apis-does-it-support-
I'm mostly interested in code (python) gen…