Open floatingbigcat opened 1 year ago
Looks good overall, I was wondering if we can make it more general?
Right now the finetune_factor is used for all parameters that are in the finetune_groups_key_words? Can we instead pass a list of dictionaries called finetune_group_lr_info = {key_word, finetune_factor}
And right now the only option supported is real_lr=lr*finetune_factor, can we instead just have a new class called GroupedAnnealingLR, that suports passing in different annealing lr parameters for each group?
So the idea is to instead do this:
finetune_group_lr_info = {key_word, annealing_lr_params} where annealing_lr_params : {start_lr, warmup_iter, total_iters, decay_style, last_iter, min_lr=0.0, }
Could you also add a small dummy test case?
Looks good overall, I was wondering if we can make it more general?
- Right now the finetune_factor is used for all parameters that are in the finetune_groups_key_words? Can we instead pass a list of dictionaries called finetune_group_lr_info = {key_word, finetune_factor}
- And right now the only option supported is real_lr=lr*finetune_factor, can we instead just have a new class called GroupedAnnealingLR, that suports passing in different annealing lr parameters for each group?
So the idea is to instead do this:
finetune_group_lr_info = {key_word, annealing_lr_params} where annealing_lr_params : {start_lr, warmup_iter, total_iters, decay_style, last_iter, min_lr=0.0, }
As we discussed, we can make another PR for further enhancement
Could you also add a small dummy test case?
Use deep.py wrapper to run this test file. https://github.com/AGI-Collective/multimodal/pull/55/commits/7682ef40fa91afabc404bd9610261d231c8d5ab7
Divide all the parameters into 4 params groups depending on w/wo weight_decay and being finetuend/pretrained.
Add two extra args
"finetune_keywords":list of string, parameter will be putted into fintune groups as long as its name contain one of these keywords, "image_prefix" is the only keyword for now.
"finetune_factor":float, control the learning rate of fintuned groups, whose real lr=pretrained_lr*finetune_factor
In this way, it leave enough room for further change, such as adding more modility encoder and also avoid adding too much hyperparameter to adjust the lr of finetune group