Closed Haojin-Hu closed 3 years ago
The preprocess.py in this repo should be used for data processing. when you print the self.sep_idx, it should be 4. The used special tokens in the dictionary is different than that of official fairseq.
trank you!a star for you
Can you provide us with more guidance on the training process? I found that your processing is completely different from fairseq. You wrote your own preprocess.py. The files preprocessed using fairseq cannot be run on your code. Hope You can update a new deadme
I have revised the README according to your suggestion. Do you have any suggestions about the guidance on the training process?
Thank you, if I find a problem during the debugging process, I will contact you
File "leca-master/fairseq/data/language_pair_dataset.py", line 145, in getitem assert self.sep_idx not in self.src[index] AssertionError
can you give me a help?