Open puigde opened 2 weeks ago
Hi,
Thank you for your interest in our work and for reaching out with your question. You're correct in noting that our public implementation does not include the specific configuration files for the grid search process. The hyperparameters (learning rate) indicated in the ssm-peft/
Hi,
Thanks for the response.
For lora, by the expression in your launching command *lora_outproj*.yaml
in ssm-peft/<model>/cfg/exps/~
, I assume the config to use would be 006_lora_r8_lora_outproj.yaml
and based on your response that it contains the final parameters, each config, for each dataset. Which if I am not wrong would account for:
common_params:
peft: cfg/peft/lora/r8/lora_outproj.json
prec: bf16
batch_size: 4
glue:
learning_rate: 0.001
num_epochs: 10
model: state-spaces/mamba-130m
cifar:
learning_rate: 0.004
num_epochs: 5
model: state-spaces/mamba-130m
dart:
learning_rate: 0.004
num_epochs: 10
model: state-spaces/mamba-130m
eval_gen:
max_length: 1024
min_length: 5
num_beams: 5
samsum:
learning_rate: 0.002
model: state-spaces/mamba-1.4b
num_epochs: 10
eval_gen:
max_length: 1024
min_length: 5
num_beams: 5
spider:
learning_rate: 0.002
model: state-spaces/mamba-1.4b
num_epochs: 10
spider-larger:
learning_rate: 0.002
model: state-spaces/mamba-2.8b
num_epochs: 10
cfg/peft/lora/r8/lora_outproj.json
{
"target_modules": [
"out_proj"
],
"r": 8,
"lora_alpha": 8,
"lora_dropout": 0.1,
"alpha_pattern": {},
"auto_mapping": null,
"base_model_name_or_path": null,
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": false,
"init_lora_weights": true,
"layers_pattern": null,
"layers_to_transform": null,
"loftq_config": {},
"megatron_config": null,
"megatron_core": "megatron.core",
"modules_to_save": null,
"peft_type": "LORA",
"rank_pattern": {},
"revision": null,
"task_type": "SEQ_2_SEQ_LM",
"use_dora": false,
"use_rslora": false
}
Also I have two follow-up questions:
mamba-peft/train.py
the trainer init (l.143) is
print("Dropping last batch")
trainer = MambaTrainer(
model=model,
train_dataset=train_data_module.dataset,
tokenizer=tokenizer,
args=MambaTrainingArguments(
learning_rate=learning_rate,
max_steps=int(num_epochs * its_per_epoch),
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=1,
gradient_accumulation_steps=gradient_accumulation_steps,
optim=optim,
output_dir=output_dir,
logging_steps=logging_steps,
dataloader_num_workers=num_data_workers,
dataloader_prefetch_factor=2,
eval_accumulation_steps=128,
info={
"trainable_params": get_trainable_parameters_ratio(model),
"cfg_path": cfg_path
},
save_strategy="steps" if not no_save else "no",
evaluation_strategy="steps" if not skip_eval else "no",
save_steps=int(eval_epochs * its_per_epoch),
eval_steps=int(eval_epochs * its_per_epoch),
dataloader_drop_last=True,
report_to="wandb",
seed=seed,
),
compute_metrics=compute_metrics,
data_collator=train_data_module.data_collator,
eval_dataset=val_data_module.dataset,
eval_generator=eval_generator,
min_eval_metric_after_epoch=min_eval_metric_after_epoch,
)
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
Thanks in advance
Hi,
Thanks for providing a public implementation for the experimental results of your paper.
I am trying to reproduce the results, regarding hyperparameters in the paper it is stated (quote):
I am not finding the piece of the code where this search is performed, are the hyperparameters in
ssm-peft/<model>/cfg/exps/~
the final ones? can we assume they are consistent across parts of the model?Thanks in advance.