AnswerDotAI / fsdp_qlora

Training LLMs with QLoRA + FSDP
Apache License 2.0
1.39k stars 186 forks source link

Finetuning benchmarking experiments #9

Closed KeremTurgutlu closed 7 months ago

KeremTurgutlu commented 7 months ago

This branch was used in running QLoRA benchmarking experiments using different GPU setups and configs. It also includes few LoRA experiments.

import torch
# Will throw kernel not available error when attn_mask is used.
# Optionally use the context manager to ensure one of the fused kernels is run
query = torch.rand(32, 8, 128, 64, dtype=torch.float16, device="cuda")
key = torch.rand(32, 8, 128, 64, dtype=torch.float16, device="cuda")
value = torch.rand(32, 8, 128, 64, dtype=torch.float16, device="cuda")
attn_mask = torch.ones(128, dtype=torch.bool, device="cuda")
with torch.backends.cuda.sdp_kernel(enable_flash=True, 
                                    enable_math=False, 
                                    enable_mem_efficient=False):
    torch.nn.functional.scaled_dot_product_attention(query,key,value,attn_mask=attn_mask)
johnowhitaker commented 7 months ago

Looking good! @warner-benjamin want to skim then merge?