ZeroYuHuang / Transformer-Patcher

26 stars 3 forks source link

How to reproduce the results in Table 2? #1

Open sev777 opened 1 year ago

sev777 commented 1 year ago

Hi, I try to reproduce the results in Table 2 "Scale up to thousands of edits", I want to know how to set parameters to achieve this.

In addition, the use of SERA in this paper seems to be different from the original. Can you provide the use of SERA in this paper?

Thanks!

sev777 commented 1 year ago

Hi, The dataset split in paper is: "Finally, we get 5,317 edit data and 15,982 for validation, and 24,051 for testing. " However, in your code, there have a note as follow: For train data, we got 228912 data points For val data, we got 12208 data points For edit data, we got 3052 data points For test data, we got 27644 data points,

So, which datasplit is the paper used? or the absence of individual data will not affect the results, as long as the split method is ensured to be the same.

ZeroYuHuang commented 1 year ago

I am very Sorry for the late reply!

1) I am sorry that we cannot provide the code for using SERA in this repo. Because we reimplement their method based on their original code, which may make codes in this repo less understandable.

2) For the zsRE dataset, we just randomly split the original training data in the ratio of 0.9: 0.075: 0.025. You can split the data in your defined ratio. And I think the ratio will slightly affect the result because the number of edits is changed if applying the different ratio.

sev777 commented 1 year ago

I am very Sorry for the late reply!

1. I am sorry that we cannot provide the code for using SERA in this repo. Because we reimplement their method based on their original code, which may make codes in this repo less understandable.

2. For the zsRE dataset, we just randomly split the original training data in the ratio of 0.9: 0.075: 0.025. You can split the data in your defined ratio. And I think the ratio will slightly affect the result because the number of edits is changed if applying the different ratio.

Thanks for your reply! And there are still some issues,

(1) The result in Table 1 is the 60 and 140 edits on FC and QA, the 60 or 140 means the wrong prediction data number(if so, the code will break when they deal with 60 or 140 edits?) or the input data numbers(If so, the wrong data is less then 60 or 140) ?

(2)The results in Table 2,I want to reproduce this results on large scale, So, can I just setting the folders number n as 1, in this way, the edits will input as a single data stream.

Thanks again!

ZeroYuHuang commented 1 year ago
  1. 60 and 140 edits are the average edit number calculated after the entire editing process. We have not broken the editing process. For example, in experiments with KE and MEND, usually, their edit number is a lot larger than this number because the overall performance of the post-edit model is destroyed.

  2. You can just set n=1, to edit on the entire edit set, remember to set more epochs for patch training. Because as the editing process, it becomes more difficult to find the patch key that is only activated by the specific input. You can adjust different hyper-parameters (e.g., patch training epochs, weights for memory losses) to balance editing time, success rate, and generality rate, etc.

Thanks again for paying attention to this work.

snwen123 commented 1 year ago

Thank you for your work. It helped me a lot. But you'd better improve your coding skills, otherwise readers will spend a lot of time understanding these obscure codes. One basic principle is to use fewer global variables. At the same time, pay attention to the logic of the code.

richhh520 commented 12 months ago

Hi @sev777, Can you reproduce the experiment? Could you please share the versions of packages used. I have met lots of version conflicts. Thanks very much!