foundation-model-stack / fms-hf-tuning

🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.
Apache License 2.0
28 stars 48 forks source link

feat: Add DataClass Arguments to Activate Padding-Free and MultiPack Plugin and FastKernels #280

Closed achew010 closed 2 months ago

achew010 commented 3 months ago

Description of the change

This PR adds two dataclass arguments to enable padding free and multipack for the sft_trainer.py, via the new fms acceleration attention-and-distributed-packing plugin and allows the current --fastkernels dataclass to support optimized full-finetuning:

These are extremely effective methods to improve training throughputs:

NOTE: adhering to the design of fms-acceleration, the new plugin is optional, and separately installed.

Notes on Padding Free

Notes on Multipack

Notes on FastKernels

Benchmarks

PaddingFree and Multipack Benchmarks for Mistral 7B

Notes:

Per Device Batch Size 4

Framework Config Num Devices Per Device Batch Size Train Runtime (secs) Speedups
full-FT 2 4 1537 baseline
padding-free 2 4 859 1.79 x
padding-free + multipack 2 4 751 2.05 x
full-FT 4 4 932 baseline
padding-free 4 4 483 1.93 x
padding-free + multipack 4 4 342 2.75 x
full-FT 8 4 551 baseline
padding-free 8 4 275 2.00 x
padding-free + multipack 8 4 163 3.38 x
Per Device Batch Size 8 Framework Config Num Devices Per Device Batch Size Train Runtime (secs) Speedup
full-FT 2 8 1722 baseline
padding-free 2 8 678 2.54 x
padding-free + multipack 2 8 603 2.86 x
full-FT 4 8 1025 baseline
padding-free 4 8 380 2.70 x
padding-free + multipack 4 8 289 3.55 x
full-FT 8 8 611 baseline
padding-free 8 8 215 2.84 x
padding-free + multipack 8 8 140 4.36 x
Verified Similar Improvements for Untokenized Dataset Framework Config Num Devices Per Device Batch Size Train Runtime (secs) Speedups
full-FT 2 4 1516 baseline
padding-free 2 4 848 1.78x
padding-free + multipack 2 4 747 2.02x

Full Finetuning Benchmarks for Mistral 7B

Early Version Of This Plugin

We have an unofficial version with more features than our present release. @kmehant is currently using for ILAB work. It addition to the padding-free and multipack, it also has the additional two plugins below:

To use the early version a quick hack of sft_trainer with pretokenized + custom tokenizer: https://github.com/fabianlim/fms-hf-tuning/tree/attn-plugin . This will be superceded by this PR in the near future

Use with these command line arugments:

      --padding_free huggingface-injected \
      --loss_across_gpus mean token \

How to verify the PR

Additional checks/tests were added to

  1. Ensures parsing --padding_free and multipack is correct in test_dataclass_parse_successfully
  2. Ensures wrong arguments to --padding_free are caught in test_dataclass_will_fail_to_accept_illegal_args
  3. Ensures Plugin is successfully instantiated from dataclass in test_framework_initialize_and_trains_with_aadp
  4. Ensure --padding_free must be used with flash-attn, otherwise error is raised
  5. Ensure --multi_pack must be used with --padding_free, otherwise error is raised
  6. Ensure --packing True with --padding_free will raise an error
  7. Ensure --fast_kernels works with full finetuning
  8. Ensure that --fast_lora not called with either --auto_gptq or --bitsandbytes will raise an error

Ran the full suite of acceleration checks to verify all fms-acceleration unit tests passed

pytest tests/acceleration/

image

Was the PR tested

fabianlim commented 2 months ago

@anhuong thanks for the review. For making this default, I drafted out various possibilities in this issue here https://github.com/foundation-model-stack/fms-hf-tuning/issues/334. We can discuss offline,

anhuong commented 2 months ago

Also we added the new automation that ensure PRs follow convention commits which you can see is failing -- https://github.com/foundation-model-stack/fms-hf-tuning/actions/runs/10920573842/job/30310716778?pr=280 please address the change

anhuong commented 2 months ago

Please update the branch with the new changes from main and then once the experimental fields are updated this is good to merge in to me 👍

anhuong commented 2 months ago

Note @kmehant I think since you requested changes, an approval is needed from your side as well before this can merge