-
The TinyLlama project aims to pretrain a 1.1B Llama on 3T tokens. So that model should be an ideal draft model for speculative inference.
https://github.com/jzhang38/TinyLlama
https://huggingfac…
-
### 🐛 Describe the bug
``` python
%env PYTORCH_ENABLE_MPS_FALLBACK=1
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
assert torch.backends.mps.i…
-
## Background
Replicate and visualize https://arxiv.org/abs/2104.07143
## What to Replicate?
## Modifications
## Related Papers/Frameworks
-
### Describe the bug
After typing the first prompt, my own prompt and the assistant's response (onscreen says *Typing...*) vanish.
Traceback:
```
Loading checkpoint shards: 100%|████████████…
-
env
```
gpu: 4*A100 80G
pytorch: 1.13.1
cuda version: 11.7
deepspeed: 0.9.0
transformers: 4.28.0.dev
```
run script
```
OUTPUT=$1
ZERO_STAGE=3
if [ "$OUTPUT" == "" ]; then
OUTPUT=./…
-
Add option to load multiple datasets
-
Our Github source code dataset is based on [the deduplicated stack](https://huggingface.co/datasets/bigcode/the-stack-dedup) filtered down to only include numerical computing, computer algebra, and fo…
-
See https://www.lesswrong.com/posts/EHbJ69JDs4suovpLw/testing-palm-prompts-on-gpt3.
Try doing 2, 3 or 4 shot inference on something like JT or neox 20B or galactica.
After we find a promising …
-
I want to test gpt-neox to make a tiny GPT model.
I would like to train it on some plaintext files.
I converted it into JSONL file and... I stuck in Tokenization.
I don't know how to do that. I hav…
-
Hi. PEFT is amazing. Thank you for sharing this amazing package for us.
However, when I used fp 16 training option using accelerate deepspeed ZeRO 3 with PEFT LoRA, error occured.
How can I handle t…