facebookresearch / XLM

PyTorch original implementation of Cross-lingual Language Model Pretraining.
Other
2.87k stars 495 forks source link

ApplyBPE get empty file #321

Open chenQ1114 opened 3 years ago

chenQ1114 commented 3 years ago

I'm trying to go through the steps in your readme and getting stuck on the fastBPE part of the "Preparing the data" section. When I try to run

$FASTBPE learnbpe 30000 data/wiki/txt/en.train > $OUTPATH/codes

$FASTBPE applybpe $OUTPATH/train.en data/wiki/txt/en.train $OUTPATH/codes & $FASTBPE applybpe $OUTPATH/valid.en data/wiki/txt/en.valid $OUTPATH/codes & $FASTBPE applybpe $OUTPATH/test.en data/wiki/txt/en.test $OUTPATH/codes &

cat $OUTPATH/train.en | $FASTBPE getvocab - > $OUTPATH/vocab &

I only find the codes file with 494KB, vocab file with 0KB, but I cannot find the train.en file in the path of $OUTPATH/

Here is the output: Loading vocabulary from data/wiki/txt/en.train ... Read 2633823583 words (9022612 unique) from text file. Loading codes from data/processed/XLM_en/30k/codes ... Loading codes from data/processed/XLM_en/30k/codes ... Loading codes from data/processed/XLM_en/30k/codes ... cat: data/processed/XLM_en/30k/train.en: No such file or directory Read 0 words (0 unique) from text file.

Any idea of the issue causing this?

colmantse commented 3 years ago

reduce the data size to see if it works?