Description

This PR addresses #9 with the following additions

Completed Linting and Formatting of fms-acceleration-plugin with 10.00/10 rating
Made changes to fms-acceleration-plugin tox.ini to support linting via tox
Reactivated Linting and Formatting Workflow in Github Actions

Tests

Tested with sample experiments on accelerated-peft-bnb and accelerated-peft-autogptq to check for breakages

accelerated-peft-bnb

export CUDA_VISIBLE_DEVICES=0,1
accelerate launch \
 --config_file scripts/benchmarks/accelerate.yaml \
 --num_processes=2 \
 --main_process_port=29500 -m tuning.sft_trainer \
 --model_name_or_path mistralai/Mistral-7B-v0.1 \
 --acceleration_framework_config_file sample-configurations/accelerated-peft-bnb-nf4-sample-configuration.yaml \
 --packing True \
 --max_seq_len 4096 \
 --fp16 True \
 --learning_rate 2e-4 \
 --torch_dtype float16 \
 --peft_method lora \
 --r 16 \
 --lora_alpha 16 \
 --lora_dropout 0.0 \
 --target_modules q_proj k_proj v_proj o_proj \
 --use_flash_attn True \
 --response_template '\n### Response:' \
 --dataset_text_field 'output' \
 --include_tokens_per_second True \
 --num_train_epochs 1 \
 --gradient_accumulation_steps 1 \
 --gradient_checkpointing True \
 --evaluation_strategy no \
 --save_strategy no \
 --weight_decay 0.01 \
 --warmup_steps 10 \
 --adam_epsilon 1e-4 \
 --lr_scheduler_type linear \
 --logging_strategy steps \
 --logging_steps 10 \
 --max_steps 30 \
 --training_data_path benchmark_outputs/data/cache.json \
 --per_device_train_batch_size 4 \
 --output_dir benchmark_outputs/exp_39/hf \
 --skip_memory_metrics False

accelerated-peft-autogptq

export CUDA_VISIBLE_DEVICES=0,1
accelerate launch \
 --config_file scripts/benchmarks/accelerate.yaml \
 --num_processes=2 \
 --main_process_port=29500 -m tuning.sft_trainer \
 --model_name_or_path TheBloke/Mistral-7B-v0.1-GPTQ \
 --acceleration_framework_config_file sample-configurations/accelerated-peft-autogptq-sample-configuration.yaml \
 --packing True \
 --max_seq_len 4096 \
 --learning_rate 2e-4 \
 --fp16 True \
 --torch_dtype float16 \
 --peft_method lora \
 --r 16 \
 --lora_alpha 16 \
 --lora_dropout 0.0 \
 --target_modules q_proj k_proj v_proj o_proj \
 --use_flash_attn True \
 --response_template '\n### Response:' \
 --dataset_text_field 'output' \
 --include_tokens_per_second True \
 --num_train_epochs 1 \
 --gradient_accumulation_steps 1 \
 --gradient_checkpointing True \
 --evaluation_strategy no \
 --save_strategy no \
 --weight_decay 0.01 \
 --warmup_steps 10 \
 --adam_epsilon 1e-4 \
 --lr_scheduler_type linear \
 --logging_strategy steps \
 --logging_steps 10 \
 --max_steps 30 \
 --training_data_path benchmark_outputs/data/cache.json \
 --per_device_train_batch_size 4 \
 --output_dir benchmark_outputs/exp_51/hf \
 --skip_memory_metrics False

Pytest Results on FMS-HF-Tuning

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly
  /workspace/.local/lib/python3.10/site-packages/peft/utils/save_and_load.py:168: UserWarning: Setting `save_embedding_layers` to `True` as the embedding layer has been resized during finetuning.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================== 4 passed, 9 warnings in 19.44s ================================

foundation-model-stack / fms-acceleration

Linting and Formatting for FMS-Acceleration-Peft package #23

Description

Tests

accelerated-peft-bnb

accelerated-peft-autogptq

Pytest Results on FMS-HF-Tuning