...
Dataset has 156725280 samples
Determining packing recipe
Begin packing pass
Unpacked mean sequence length: 254.43
Found 22102 unique packing strategies.
Iteration: 0: sequences still to pack: 156725280
Traceback (most recent call last):
File "/sox/habana-intel/Model-References/MLPERF3.1/Training/benchmarks/bert/implementations/PyTorch/pack_pretraining_data_pytorch.py", line 467, in
main()
File "/sox/habana-intel/Model-References/MLPERF3.1/Training/benchmarks/bert/implementations/PyTorch/pack_pretraining_data_pytorch.py", line 420, in main
strategy_set, mixture, padding, slicing = get_packing_recipe(args.output_dir, sequence_lengths, args.max_sequence_length, args.max_sequences_per_pack)
File "/sox/habana-intel/Model-References/MLPERF3.1/Training/benchmarks/bert/implementations/PyTorch/pack_pretraining_data_pytorch.py", line 111, in get_packing_recipe
partial_mixture, rnorm = optimize.nnls(np.expand_dims(w0, -1) A, w0 b)
File "/opt/python-llm/lib/python3.10/site-packages/scipy/optimize/_nnls.py", line 93, in nnls
raise RuntimeError("Maximum number of iterations reached.")
RuntimeError: Maximum number of iterations reached.
follow the instructions on https://github.com/HabanaAI/Model-References/tree/master/MLPERF3.1/Training/benchmarks
to execute comamnd:
python3 pack_pretraining_data_pytorch.py --input_dir=$PYTORCH_BERT_DATA/hdf5/training-4320/hdf5_4320_shards_uncompressed --output_dir=$PYTORCH_BERT_DATA/packed --max_predictions_per_seq=76
scipy 1.13.0... Dataset has 156725280 samples Determining packing recipe Begin packing pass Unpacked mean sequence length: 254.43 Found 22102 unique packing strategies.
Iteration: 0: sequences still to pack: 156725280 Traceback (most recent call last): File "/sox/habana-intel/Model-References/MLPERF3.1/Training/benchmarks/bert/implementations/PyTorch/pack_pretraining_data_pytorch.py", line 467, in
main()
File "/sox/habana-intel/Model-References/MLPERF3.1/Training/benchmarks/bert/implementations/PyTorch/pack_pretraining_data_pytorch.py", line 420, in main
strategy_set, mixture, padding, slicing = get_packing_recipe(args.output_dir, sequence_lengths, args.max_sequence_length, args.max_sequences_per_pack)
File "/sox/habana-intel/Model-References/MLPERF3.1/Training/benchmarks/bert/implementations/PyTorch/pack_pretraining_data_pytorch.py", line 111, in get_packing_recipe
partial_mixture, rnorm = optimize.nnls(np.expand_dims(w0, -1) A, w0 b)
File "/opt/python-llm/lib/python3.10/site-packages/scipy/optimize/_nnls.py", line 93, in nnls
raise RuntimeError("Maximum number of iterations reached.")
RuntimeError: Maximum number of iterations reached.
Training Data Packing![image](https://github.com/HabanaAI/Model-References/assets/10793075/3c87b555-2be4-4f88-b511-a01188342d0e)