Open keai007 opened 3 years ago
Hi @keai007 , I try to preprocess another programming language as well. I was wondering if you figure out how to get parallel data? Thanks!
Hi, You should take a look at https://github.com/facebookresearch/XLM#1-preparing-the-data-1 .Transcoder is based on XLM, and that repo contains much more clear tutorials and meaningful discussions. It helped me a lot, and hope can help you too.
@keai007 Thank you very much for sharing!
@keai007 (or anyone of the authors!) I looked at the get-data-para.sh script in the other repository. From preprocessing, we have already tokenized train/valid/test sets, with BPE applied and binarized. Do we just have to duplicate and rename those test/valid files?
Hi, I try to preprocess another programming language to train my new model. But I cannot figure out how to get parallel data when trainning AE & BT,eg
test.python_sa-cpp_sa.pth
. I'll appreciate it very much if you could help me.
HELP
HELP
HELP perlconverter@gmail.com
HELP from INDIA
Does anyone find the process to generate parallel dataset to be used in the training process with AE & BT ? Any help will be much appreciated.
Hi, I try to preprocess another programming language to train my new model. But I cannot figure out how to get parallel data when trainning AE & BT,eg
test.python_sa-cpp_sa.pth
. I'll appreciate it very much if you could help me.