preprocess data for "Dual2seq" model

freesunshine0316 / semantic-nmt

Code corresponding to our paper "Semantic Neural Machine Translation using AMR"

26 stars 5 forks source link

Open Lavine24 opened 5 years ago

Lavine24 commented 5 years ago

Hi, Thanks for the awesome work, I meet some questions on preprocess the data.

I use your release data in nc-v11, but find no "nc-v11.tok_le50.json" and "vectors.en.st" file on the release data.
I try to use the dual_to_seq/data scrips to generate the json file and use the pre-trained embedding file. are this the true preprocess methods on the "dual2seq" model? Thank you very much!

freesunshine0316 commented 5 years ago

Yes they are not included and you have to generate them by running the provided scripts.
Yes.

lidongxing commented 4 years ago

@freesunshine0316 Where is the pretrained embeddings of 'de' language from or trained from scratch with some corpus? Tks.

freesunshine0316 commented 4 years ago

@lidongxing The embeddings of 'de' are jointly trained from scratch during NMT training.