Open binaryninja opened 1 year ago
@binaryninja , not sure about fine tuning but i was facing same error when loading model . I think you need to change batch size for loading model i used accelerate (https://huggingface.co/docs/accelerate/usage_guides/big_modeling)
I think you need to change batch size
I'm currently using --batch_size 1
@binaryninja , tbh documentation here is very bad i had tough time just loading the model . i think you have to explore peft
for settings etc since
To fine-tune cheaply and efficiently, we use Hugging Face 🤗's PEFT
P.S : i'm trying to finetune too , will let you know if anything works
@binaryninja , loading model using .from_pretrained
does not work me and seems to be cause of ^ .
I had to load it with accelerate and custom device_map
from accelerate import init_empty_weights
from accelerate import load_checkpoint_and_dispatch
from accelerate import infer_auto_device_map
def run_training(args, train_data, val_data):
print("Loading the model")
config = AutoConfig.from_pretrained("bigcode/starcoderbase")
print(config)
with init_empty_weights():
model = AutoModelForCausalLM.from_config(config)
model.tie_weights()
print("loading and dispatching")
my_device_map = infer_auto_device_map(model, max_memory={0: "13GiB", "cpu": "70GiB"})
model = load_checkpoint_and_dispatch(
model, "/home/ubuntu/.cache/huggingface/hub/models--bigcode--starcoderbase/snapshots/2417d4a7324a43db14b2a7729d17311d35dbde6e", device_map=my_device_map, no_split_module_classes=["GPTJBlock"])
# disable caching mechanism when using gradient checkpointing
# model = AutoModelForCausalLM.from_pretrained(
# args.model_path,
# use_auth_token=True,
# use_cache=not args.no_gradient_checkpointing,
# load_in_8bit=True,
# device_map={"": Accelerator().process_index},
# )
print("done loading")
model = prepare_model_for_int8_training(model)
....redacted....
the trick seems to be custom device map
my_device_map = infer_auto_device_map(model, max_memory={0: "13GiB", "cpu": "70GiB"})
i only had 1 gpu so 0
you need to change ^ according to your platform
@binaryninja For the default fine-tuning script, I think the memory required should be around 26G
memory which exceeds the 24GB in your configuration. If you would like to fine-tune it on your machine, maybe integration of deepspeed
is a must-do. I'm exploring it and may provide some feedback when I can succeed in training if with less than 24G memory.
There appears to be a related issue with bitsandbytes
I'll downgrade to 0.37.2 and report back.
Maybe you're running out of memory while using pytorch? This error can occur in most cases if your GPU is already occupied with other processes. Or maybe the reserved memory is larger than allocated memory. Either way you can try setting max_split_size_mb to avoid fragmentation.
How much 3090 cards do you use to fine tune the model?
also fails w/ Nvidia 4090 (24G but faster)
Find out the reason, use bitsandbytes=0.37.2 will work
how long it takes to finetune? stuck...
I am attempting to finetune the model using the command provided in the README. I am getting CUDA OutOfMemoryError:
The hardware is a 24GB 3090. It goes out of memory when saving the file, otherwise the training runs well.
To reproduce the error quickly I add --save_freq 2 to trigger the error early on eg:
python3 finetune/finetune-split.py --model_path="bigcode/starcoder" --dataset_name="ArmelR/stack-exchange-instruction" --subset="data/finetune" --split="train" --size_valid_set 10000 --streaming --seq_length 256 --save_freq 2 --max_steps 1000 --batch_size 1 --input_column_name="question" --output_column_name="response"
I've reduced sequence length here but have tried other context lengths as well.
If I leave the save_freq I'll get a full training run in until the final stage and then it crashed.
Here is an example wandb training run: Example