ghchen18 / leca

Code for Lexical-Constraint-Aware Neural Machine Translation via Data Augmentation
Other
17 stars 5 forks source link

a problem with train.py #6

Closed Haojin-Hu closed 3 years ago

Haojin-Hu commented 3 years ago

File "leca-master/fairseq/data/language_pair_dataset.py", line 145, in getitem assert self.sep_idx not in self.src[index] AssertionError

can you give me a help?

ghchen18 commented 3 years ago

The preprocess.py in this repo should be used for data processing. when you print the self.sep_idx, it should be 4. The used special tokens in the dictionary is different than that of official fairseq.

Haojin-Hu commented 3 years ago

trank you!a star for you

Haojin-Hu commented 3 years ago

Can you provide us with more guidance on the training process? I found that your processing is completely different from fairseq. You wrote your own preprocess.py. The files preprocessed using fairseq cannot be run on your code. Hope You can update a new deadme

ghchen18 commented 3 years ago

I have revised the README according to your suggestion. Do you have any suggestions about the guidance on the training process?

Haojin-Hu commented 3 years ago

Thank you, if I find a problem during the debugging process, I will contact you