Update Benchmarks and Documentation for GraniteCausalLM

fabianlim commented 2 months ago

In this PR we update the benchmarks for GraniteCausalLM

in addition, the README.md is also updated to describe how a new model can be added in the future
NOTE: we did not update the GPTQ results in the bench. will do this possibly at a later time.

Note this PR requires the following dependency updates

transformers>=4.45: for GraniteCausalLM
accelerate>=0.34.1: required for transformers>=4.45 if GraniteCausalLM is needed.
trl > 0.11.1: when using baseline bnb, requires this fix for a bug that was introduced in transformers==4.45 https://github.com/huggingface/trl/pull/2089
bitsandtbyes==0.43.3: it seems that the later versions give segmentation fault errors

Known issues with quant peft

[x] single GPU w/o FOAK
[x] single GPU w FOAK -> ~fused lora dequant problem~ (this is an issue with the compiled binaries in bitsandbytes 0.43.3, that is not compatible with maybe the CUDA toolkit or torch version)
[x] multi GPU w/o FOAK -> ~rank 1 stuck at prepare_model~ (this is resolved by disabling low_cpu_mem_mode)
[x] multi GPU w FOAK -> ~meta device problem~ (see 2 in #83) (this is resolved by disabling low_cpu_mem_mode)
[x] bad loss with BNB+FOAK -> (resolved by updating lora fused ops to support bias)

Performance

Overall impressive improvements with kernels.

FULL FT

PEFT

Quantized Peft (BNB)

wynterl commented 2 months ago

awesome, great results @fabianlim

raghukiran1224 commented 2 months ago

Indeed, awesome results @fabianlim !

fabianlim commented 2 months ago

@wynterl @raghukiran1224 the loss for BNB + fused ops looks problematic. ~Needs more debugging~, Ok i found that its because Granite has a bias in the Linear, but the FOAK kernels do not support bias. This just requires some minor (but tedious) modifications

foundation-model-stack / fms-acceleration

Update Benchmarks and Documentation for GraniteCausalLM #86

Performance