bort pretrain - Githubissues

nicexw commented 3 years ago

i try to use create_pretraining_data.py for bort pretrain

python create_pretraining_data.py --input_file ./train/train.txt0,./train/train.txt1,./train/train.txt2,./train/train.txt3,./train/train.txt4,./train/train.txt5,./train/train.txt6,./train/train.txt7,./train/train.txt8,./train/train.txt9 --output_dir output --dupe_factor 1 INFO:root:Namespace(dataset_name='openwebtext_ccnews_stories_books_cased', dupe_factor=1, input_file='./train/train.txt0,./train/train.txt1,./train/train.txt2,./train/train.txt3,./train/train.txt4,./train/train.txt5,./train/train.txt6,./train/train.txt7,./train/train.txt8,./train/train.txt9', masked_lm_prob=0.15, max_predictions_per_seq=80, max_seq_length=512, num_outputs=1, num_workers=8, output_dir='output', random_seed=12345, short_seq_prob=0.1, verbose=False, whole_word_mask=False) INFO:root: ./train/train.txt0 INFO:root: ./train/train.txt1 INFO:root: ./train/train.txt2 INFO:root: ./train/train.txt3 INFO:root: ./train/train.txt4 INFO:root: ./train/train.txt5 INFO:root: ./train/train.txt6 INFO:root: ./train/train.txt7 INFO:root: ./train/train.txt8 INFO:root: ./train/train.txt9 INFO:root: Reading from 10 input files

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib64/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/usr/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "create_pretraining_data.py", line 304, in create_training_instances vocab, tokenizer))) File "create_pretraining_data.py", line 385, in create_instances_from_document 0, len(all_documents) - 2) File "/export/sdb/xiongwei/tfmxnet/lib64/python3.6/random.py", line 221, in randint return self.randrange(a, b+1) File "/export/sdb/xiongwei/tfmxnet/lib64/python3.6/random.py", line 199, in randrange raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width)) ValueError: empty range for randrange() (0,0, 0) """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "create_pretraining_data.py", line 691, in main() File "create_pretraining_data.py", line 597, in main pool.map(create_training_instances, process_args) File "/usr/lib64/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/lib64/python3.6/multiprocessing/pool.py", line 644, in get raise self._value ValueError: empty range for randrange() (0,0, 0)

adewynter commented 3 years ago

Hi -- Danny (@daniel-perry) will be looking into this and get back to you on this thread.

Thanks,

-Adrian

daniel-perry commented 3 years ago

@nicexw - Thanks for trying Bort and sorry you have run into this issue. While we work on addressing the issue, here is a temporary workaround: try running with --num_workers 1 on each file in turn.

For example if you are running in bash, you can do something like:

i=0
mkdir output
for file in `ls train/*txt*`
do
    python create_pretraining_data.py --input_file ${file} --output_dir tmp --dupe_factor 1 --num_workers 1 --num_output 1
    mv tmp/part-000.npz output/part-${i}.npz
    i=$((i+1))
done
rm -r tmp

nicexw commented 3 years ago

@nicexw - Thanks for trying Bort and sorry you have run into this issue. While we work on addressing the issue, here is a temporary workaround: try running with --num_workers 1 on each file in turn.

For example if you are running in bash, you can do something like:
i=0
mkdir output
for file in `ls train/*txt*`
do
    python create_pretraining_data.py --input_file ${file} --output_dir tmp --dupe_factor 1 --num_workers 1 --num_output 1
    mv tmp/part-000.npz output/part-${i}.npz
    i=$((i+1))
done
rm -r tmp

thank you very much! the above is caused by "data contains only one document", NSP task need negtive example from other document. when i set blanks in diffent q-d, there is no error

nicexw commented 3 years ago

this issue can be closed

adewynter commented 3 years ago

Super! Glad to know you got it solved.

alexa / bort

bort pretrain #6