[WIP] [GenAI] Lora Finetune

Lora fine-tuning is an adapter-based technique to fine-tune an LLM. It changes LLM model architecture by adding learnable lora layers to transformers. During fine-tuning, only lora weights are adjustable and the LLM weights are frozen, so it requires much less GPU memory comparing to a full-layer fine-tuning. Based on this table, it requires 16GB memory to fine-tuning a 7B size model in 16bits, which can be fit in rtx 3090, 4080 and 4090. A wider range of GPUs can be fit on 3.8B LLMs like phi-3.5-mini

API design (wip)

Package: `Microsoft.ML.GenAI.Lora`

interface ICausalLMLoraPipeline {} // pipeline for loading causal LM + lora layers

class LoraConfiguration // lora configuration

dotnet / machinelearning