Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, *kwds))
File "/usr/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
return list(map(args))
File "/mpt/Megatron-LLM/Megatron-LLM/tools/preprocess_data.py", line 71, in encode
text = data[key]
KeyError: 'text'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mpt/Megatron-LLM/Megatron-LLM/tools/preprocess_data.py", line 201, in
main()
Special tokens: {'': 32000, '': 32001, '': 32002, '': 32003, '': 32004, '': 1, '': 2}
padded vocab (size: 32005) with 123 dummy tokens (new size: 32128)
File "/mpt/Megatron-LLM/Megatron-LLM/tools/preprocess_data.py", line 179, in main
for i, (doc, bytes_processed) in enumerate(encoded_docs, start=1):
File "/usr/lib/python3.10/multiprocessing/pool.py", line 423, in
return (item for chunk in result for item in chunk)
File "/usr/lib/python3.10/multiprocessing/pool.py", line 873, in next
raise value
KeyError: 'text'
Traceback (most recent call last): File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/usr/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/mpt/Megatron-LLM/Megatron-LLM/tools/preprocess_data.py", line 71, in encode text = data[key] KeyError: 'text' """
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/mpt/Megatron-LLM/Megatron-LLM/tools/preprocess_data.py", line 201, in
main()
Special tokens: {'': 32000, '': 32001, '': 32002, '': 32003, '': 32004, '
': 1, '': 2}