What 's in train.txt and valid.txt these two files during preprocessing

Noticing that in Preprocssing period 2 Fairseq Processing, we use this command:

python preprocess.py --only-source --trainpref /path/to/train.txt --validpref path/to/valid.txt --srcdict /path/to/dict.txt --destdir /path/to/destination_dir --padding-factor 1 --workers 48

Excuse me but i don't understand what is in --trainpref and --validpref? What is the relation between the output of bpe_tokenize.py and the input of preprocess.py? Do I need to split my output of bpe_tokenization into two parts?(train and valid)

facebookresearch / SpanBERT

What 's in train.txt and valid.txt these two files during preprocessing #70