Set diff params groups - Githubissues

CERC-AAI / multimodal

An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.

Apache License 2.0

8 stars 3 forks source link

Set diff params groups #55

Open floatingbigcat opened 1 year ago

floatingbigcat commented 1 year ago

Divide all the parameters into 4 params groups depending on w/wo weight_decay and being finetuend/pretrained.

Add two extra args

"finetune_keywords":list of string, parameter will be putted into fintune groups as long as its name contain one of these keywords, "image_prefix" is the only keyword for now.
"finetune_factor":float, control the learning rate of fintuned groups, whose real lr=pretrained_lr*finetune_factor

In this way, it leave enough room for further change, such as adding more modility encoder and also avoid adding too much hyperparameter to adjust the lr of finetune group

kshitijkg commented 1 year ago

Looks good overall, I was wondering if we can make it more general?

Right now the finetune_factor is used for all parameters that are in the finetune_groups_key_words? Can we instead pass a list of dictionaries called finetune_group_lr_info = {key_word, finetune_factor}
And right now the only option supported is real_lr=lr*finetune_factor, can we instead just have a new class called GroupedAnnealingLR, that suports passing in different annealing lr parameters for each group?

So the idea is to instead do this:

finetune_group_lr_info = {key_word, annealing_lr_params} where annealing_lr_params : {start_lr, warmup_iter, total_iters, decay_style, last_iter, min_lr=0.0, }

kshitijkg commented 1 year ago

Could you also add a small dummy test case?

floatingbigcat commented 1 year ago

Looks good overall, I was wondering if we can make it more general?

Right now the finetune_factor is used for all parameters that are in the finetune_groups_key_words? Can we instead pass a list of dictionaries called finetune_group_lr_info = {key_word, finetune_factor}

And right now the only option supported is real_lr=lr*finetune_factor, can we instead just have a new class called GroupedAnnealingLR, that suports passing in different annealing lr parameters for each group?

So the idea is to instead do this:

finetune_group_lr_info = {key_word, annealing_lr_params} where annealing_lr_params : {start_lr, warmup_iter, total_iters, decay_style, last_iter, min_lr=0.0, }

As we discussed, we can make another PR for further enhancement

floatingbigcat commented 1 year ago

Could you also add a small dummy test case?

Use deep.py wrapper to run this test file. https://github.com/AGI-Collective/multimodal/pull/55/commits/7682ef40fa91afabc404bd9610261d231c8d5ab7