Closed achew010 closed 2 months ago
@anhuong thanks for the review. For making this default, I drafted out various possibilities in this issue here https://github.com/foundation-model-stack/fms-hf-tuning/issues/334. We can discuss offline,
Also we added the new automation that ensure PRs follow convention commits which you can see is failing -- https://github.com/foundation-model-stack/fms-hf-tuning/actions/runs/10920573842/job/30310716778?pr=280 please address the change
Please update the branch with the new changes from main
and then once the experimental fields are updated this is good to merge in to me 👍
Note @kmehant I think since you requested changes, an approval is needed from your side as well before this can merge
Description of the change
This PR adds two dataclass arguments to enable padding free and multipack for the
sft_trainer.py
, via the newfms acceleration
attention-and-distributed-packing plugin and allows the current--fastkernels
dataclass to support optimized full-finetuning:--padding_free
: technique to process multiple examples in single batch without adding padding tokens that waste compute.--multipack
: technique for multi-gpu training to balance out number of tokens processed in each device, to minimize waiting time.--fast_kernels
: Previously limited only for QPEFT (used to raise if not activated with--fast_lora
), Now allows for optimized full/standard LoRA finetuning.These are extremely effective methods to improve training throughputs:
NOTE: adhering to the design of fms-acceleration, the new plugin is optional, and separately installed.
Notes on Padding Free
<=4.43
), when padding free is not yet integrated from our PR into Hugging Face: https://github.com/huggingface/transformers/pull/31629.>= 4.44
).Notes on Multipack
Notes on FastKernels
--fast_kernels True True True
on full finetuning/LoRA runs--fast_kernels True True True --auto_gptq triton_v2 --fused_lora auto_gptq True
for GPTQ-LoRA--fast_kernels True True True --bitsandbytes nf4 --fused_lora bitsandbytes True
for QLoRApositional_ids
but this issue will be addressed in the futureBenchmarks
PaddingFree and Multipack Benchmarks for Mistral 7B
Notes:
Per Device Batch Size 4
Full Finetuning Benchmarks for Mistral 7B
Early Version Of This Plugin
We have an unofficial version with more features than our present release. @kmehant is currently using for ILAB work. It addition to the padding-free and multipack, it also has the additional two plugins below:
To use the early version a quick hack of
sft_trainer
with pretokenized + custom tokenizer: https://github.com/fabianlim/fms-hf-tuning/tree/attn-plugin . This will be superceded by this PR in the near futureUse with these command line arugments:
How to verify the PR
Additional checks/tests were added to
--padding_free
andmultipack
is correct intest_dataclass_parse_successfully
--padding_free
are caught intest_dataclass_will_fail_to_accept_illegal_args
test_framework_initialize_and_trains_with_aadp
--padding_free
must be used with flash-attn, otherwise error is raised--multi_pack
must be used with--padding_free
, otherwise error is raised--packing True
with--padding_free
will raise an error--fast_kernels
works with full finetuning--fast_lora
not called with either--auto_gptq
or--bitsandbytes
will raise an errorRan the full suite of acceleration checks to verify all fms-acceleration unit tests passed
Was the PR tested