Slowdown and Higher Memory Consumption for GPTQ-LoRA with Bfloat16 - Githubissues

foundation-model-stack / fms-acceleration

🚀 Collection of libraries used with fms-hf-tuning to accelerate fine-tuning and training of large models.

Apache License 2.0

3 stars 6 forks source link

Slowdown and Higher Memory Consumption for GPTQ-LoRA with Bfloat16 #84

Open achew010 opened 6 days ago

achew010 commented 6 days ago

Description

Regression Test for Loss, Memory,

Throughput Comparisons on loss, memory and throughput for Full-FT, PEFT

QLoRA: status quo on the switch of torch_dtype=float16 (Reference) to torch_dtype=bfloat16 (New).
GPTQ-LoRA: impact in terms of increase in memory consumption and decrease in throughput with

Subset of Outliers processed into this table

A = pd.read_csv('outliers.1.csv', index_col=None)

As = []

for tag,G  in A.groupby('scenario'):
    reg = G.reference < G.new # those that got higher (worse)
    if tag =='train_tokens_per_second':
        reg = reg.apply(lambda x: not x) # these are those that are worse if lower

    As.append(G.loc[reg])

A = pd.concat(As)

fabianlim commented 5 days ago

@achew010 are we positive the slow down only affects GPTQ-LoRA and nothing else (e.g., full, regular peft). I remember you used to print out a table, can we check it also