missing 'transformer_wmt19_de_en' arch when I trying to reproduce the Adaptive-kNN

NJUNLP / knn-box

an easy-to-use knn-mt toolkit

MIT License

103 stars 12 forks source link

missing 'transformer_wmt19_de_en' arch when I trying to reproduce the Adaptive-kNN #1

Closed Ruzzell13 closed 1 year ago

Ruzzell13 commented 2 years ago

Thanks for this repo to gather the knn-series' code. I have successfully reproduced the vanilla-knn under the guidance of it. But the error when I trying to reproduce the Adaptive-knn in the [stage 2. train meta-k network] disturbed me for so long , the error info shows:

train.py: error: argument --arch/-a: invalid choice: 'transformer_wmt19_de_en' (choose from 'transformer_tiny', 'transformer', 'transformer_iwslt_de_en', 'transformer_wmt_en_de', 'transformer_vaswani_wmt_en_de_big', 'transformer_vaswani_wmt_en_fr_big', 'transformer_wmt_en_de_big' ... )

It seems that there's no such arch file as 'transformer_wmt19_de_en' which is used as the args. I wonder if it is a modified architecture by yourself or the elder version of fairseq? (I mentioned that you recommend the 0.12.2 version of fairseq and I'm pretty sure I successfully built it.) From the list I can only find the arch 'wmt_en_de' , it is also not compatible with the wmt19.de-en.ffn8192.pt file, but some 'de_en' arches only related to iwslt datasets... (T_T)

ZhaoQianfeng commented 2 years ago

@Ruzzell13 Hi, Thanks for using our toolkit ! wmt19 winner model is not a fairseq pre-defined arch, to use the model you should registe the arch by yourself. add the code below to the end of fairseq/models/transformer.py

@register_model_architecture("transformer", "transformer_wmt19_de_en")
def transformer_wmt19_de_en(args):
    args.dropout = getattr(args, "dropout", 0.2)
    args.encoder_ffn_embed_dim = getattr(args, "encoder_ffn_embed_dim", 8192)
    args.share_decoder_input_output_embed = getattr(
        args, "share_decoder_input_output_embed", True
    )
    args.share_all_embeddings = getattr(args, "share_all_embeddings", True)
    transformer_wmt_en_de_big(args)

sorry for not mention it in tutorials.

In addition, to make the toolkit easier to use, I deleted the old code (has some bug) and uploaded the entire fairseq project (based on v0.10.1)，now you can fork the new repo and directly run our scripts to reproduce knn models' results. I have roughly verified the results, which are basically correct. Refer to the new readme documentation for more information. If you encounter problems or find bugs, welcome the first time to tell me. Thank u ^_^

Ruzzell13 commented 2 years ago

THANKS! And I was surprised for the new framework of knn-box, which is really easy to switch without modify files every time.

However, when I trying to reproduce it, a new error shows in 'knn-box/fairseq/scoring/tokenizer.py' about missing the import of TOKENIZER. I searched google and switched several versions of scareblue but it didn't work so I rudely replace the /tokenizer.py by the previous version you released, it works...

Luckily the base model(vanilla NMT) result of each domain dataset is extremely similar to your result, but when I finished building the datastore, the result of knn-mt is quite different on Koran and Law(medical and it are also extremely similar). There're also some warnings that may cause the gap between my results and yours I guess? My code environment is pytorch-1.12.0-py3.7-cuda11.3-cudnn8.3.2, and others are installed as setup.py. I don't know if you also got these warnings?

Thanks again!

ZhaoQianfeng commented 2 years ago

@Ruzzell13 Hi, you can't change K from 16 to 8 or 4 at inference stage is because when train metak, you set K=16. metak network's architecture is related to K (actually means max_k), so if you want to set K=8 at inference stage, you should train a metak network with K = 8.

The problem of sacrebleu is about version, fairseq v0.10.1 dosen't support new version sacrebleu... you can install an old version sacrebleu by execute:

pip install sacremoses==0.0.41
pip install sacrebleu==1.5.1

I didn't encounter the first warning because I use pytorch 1.7.0, but I think it should have no impact. The second warning dosen't matter, because we have already detokenize the corpus in advance.

I also find the knnbox result of Law domain is different from paper

paper recommand parameter: k = 4, temperature = 100, lambda = 0.8
knnbox got lower result than paper.

but if we change temperature = 10, knnbox's result is 61.31, equal to paper's result, really strange.

I will check this problem this weekend , maybe it's a problem of datastore accuracy, or problem caused by some library versions. I'll let you know if the problem is solved^_^

Ruzzell13 commented 2 years ago

Yes, I found that K was also a parameter in [trainmetak.sh] so I delete the last reply, hhhhh. I'll follow your instructions and wait for your code. ^^

ZhaoQianfeng commented 2 years ago

@Ruzzell13 Hi, I think I found why the results of koran and law domain are abnormal. The author of the paper accidentally reverses the recommand temperature of Koran and Law in the table... If you read the Appendix A.2:

A.2 Hyper-Parameter Tuning for kNN-MT The performance of vanilla kNN-MT is highly related to the choice of hyper-parameter, i.e. k, T and λ. We fix T as 10 for IT, Medical, Law, and 100 for Koran in all experiments. Then, we tuned k and λ for each domain when using kNN-MT and the optimal choice for each domain are shown in Table 9. The performance of kNN-MT is unstable with different hyper-parameters while our Adaptive kNN-MT avoids this problem ......

note here he says fix T as 10 for Law and 100 for Koran, but the table below recommand 100 for Law and 10 for Koran by mistake... I consulted the paper's author and confirm that the data in the table is indeed reversed.

So, use temperature 10 for Law and 100 for Koran, and you will get a reasonable result. I post my results for reference：		IT	Med	Koran	Law
paper	45.96	54.16	20.30	61.31
knnbox	45.93	54.09	20.59	60.91

my environment is torch==1.5.0, cuda==10.2, numpy==1.21.5, faiss-gpu==1.6.5 .it seems that different torch version's results have tiny difference, So your results may not be exactly the same as mine.

ZhaoQianfeng commented 2 years ago

And thank you for your advice on TOKENIZER problem, I replace the fairseq/scoring/tokenizer.py file to be compatible with the new version of sacrebleu