Question about "{d}" and "{p,d}"

DachengLi1 / MPCFormer

(ICLR 2023 Spotlight) MPCFormer: fast, performant, and private transformer inference with MPC

84 stars 10 forks source link

Question about "{d}" and "{p,d}" #5

Open hujunyi96 opened 1 year ago

hujunyi96 commented 1 year ago

As is written in your article:.“p” stands for using weights in T as initialization, “d” stands for applying knowledge distillation with T as the teacher. My question is：Does “using weights in T as initialization” mean fine-tuned model? E.g.”p” stands for “fine-tuning”, namely, 1.MPCBert-B stands for the most basic pre-trained transformer, 2.MPCBert-Bw/o{d} stands for applying KD on the most basic pre-trained transformer, 3. MPCBert-Bw/o{p,d} stands for applying KD on fine-tuned transformer?

DachengLi1 commented 1 year ago

@hujunyi96 Thanks for checking the paper! Please take a look at the baseline subsections in Experiments section.

hujunyi96 commented 1 year ago

1.MPCFORMERw/o{d} also constructs the approximated model S‘ but trains’ on D with the task-specific objective, i.e., without distillation. We note that S‘ is initialized with weights in T , i.e., with different functions, whose effect has not been studied. We thus propose a second baseline MPC-FORMERw/o{p,d}, which trains S’on D without distillation, and random weight initialization.(from the baseline subsections in Experiments section); 2.“p” stands for using weights in T as initialization, “d” stands for applying knowledge distillation with T as the teacher.(From Table2)

Hello, @DachengLi1 ,I think the two expressions are contradictory, aren't they? Simply put it, could you please directly explain what procedures 1.MPCBert-B, 2.MPCBert-Bw/o{d}, 3.MPCBert-Bw/o{p,d} have gone through respectively? Thanks a lot!

DachengLi1 commented 1 year ago

@hujunyi96 Definitely! Assuming a Bert-Base with CoLA example, T means a Bert-base fine-tuned on CoLA (1) MPCBert-B is our method: trained with distillation objective with T as the teacher, started from a Bert-Base. (2) MPCBert-B w/o {d}: trained with task objective, started from a Bert-Base. (3) MPCBert-B w/o {p,d}: trained with task objective, started from a randomly initialized Bert-Base architecture (not trained at all).

Note: all of these three models are S', which uses approximation. Only T uses GeLU+Softmax, if that is confusing.

hujunyi96 commented 1 year ago

hello, @DachengLi1 , when I was trying to use the param "--hidden_act quad" to train baselines with appromations, which are the first major innovation in your paper(the second one would be Distillation), an error occurred: KeyError:'quad'. That means the source code of transformer libs such as 'hidden_act' in BertConfig class don't support the new activation funcs in your paper(exact lib files that cause this error are: xxx/site-package/transformers/activations.py, line 208, in getitem) . That said, I wonder how did you realize the quad function since the current code in this repo has the error above when running. Did you change source python lib files?

  I am looking forward to your reply, thanks!

DachengLi1 commented 1 year ago

@hujunyi96 We have a modified version of Transformers that will do this https://github.com/DachengLi1/MPCFormer/tree/main/transformers. In particular here:https://github.com/DachengLi1/MPCFormer/blob/38cb42cb194bfaa2d8deb1e7a9ce7e33543e7519/src/main/transformer/modeling.py#L139. Maybe you are using the one in your environment? Should be easy to fix by checking some file path.

Or even simpler, you can just copy paste these several new functions to whereever you want them to be.

hujunyi96 commented 1 year ago

@DachengLi1 I see. I was following the main procedures in README.md in/baselines folder, as you can see the commands listed are actually exectuting run_glue.py, which seems doesn't import [MPCFormer/src/main/transformer/modeling.py] as a module. So I didn't notice the module is already in the project. Thanks for your help!

hujunyi96 commented 11 months ago

How does the command "pip install -e ." executed in path "/MPCFormer/transformers" achieves installing modules in a different path "[/src/main/transformer/]"?