Closed igotthisnow closed 1 year ago
Hey @igotthisnow
It seems to be due to multiprocessing which can be tricky in some environments, especially when working on notebooks. Here are 3 suggestions you can try
Adjusting Multiprocessing: Adjusting the multiprocessing start method to 'fork' may already solve it. Add this at the beginning
import multiprocessing
multiprocessing.set_start_method('fork')
Use .py
script instead of .ipynb
notebook:
You can copy the notebook code into a .py
script and try to run it there, it may work. But if you really want to run it in a notebook you should go for 3.rd approach
Simple For-Loop:
If not, Instead of using ProcessPoolExecutor
, you can replace the parallel processing logic with a straightforward for-loop.
Replace
with ProcessPoolExecutor(max_workers=max_workers) as executor:
executor.map(tokenize_chunk_fn, tokenize_chunk_paths)
With
for chunk_path in tokenize_chunk_paths:
tokenize_chunk_fn(chunk_path)
It's more direct and can eliminate the multiprocessing complications, though it might not be as fast. Hoping this helps
Suggestions 1 and 2 together solved the problem. Thanks!
@cindysridykhan could you please help with this error?
`Traceback (most recent call last): File "", line 1, in
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
prepare(preparation_data)
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 291, in run_path
File "", line 98, in _run_module_code
File "", line 88, in _run_code
File "/Users/igotthisnow/Documents/code/instruct_storyteller_tinyllama2/notebooks/prepare_instruct_dataset.py", line 147, in
tokenize_all_chunks(
File "/Users/igotthisnow/Documents/code/instruct_storyteller_tinyllama2/notebooks/prepare_instruct_dataset.py", line 134, in tokenize_all_chunks
executor.map(tokenize_chunk_fn, tokenize_chunk_paths)
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 820, in map
results = super().map(partial(_process_chunk, fn),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 608, in map
fs = [self.submit(fn, args) for args in zip(iterables)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 608, in
fs = [self.submit(fn, args) for args in zip(iterables)]
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 791, in submit
self._adjust_process_count()
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 750, in _adjust_process_count
self._spawn_process()
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 768, in _spawn_process
p.start()
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 164, in get_preparation_data
_check_not_importing_main()
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 140, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
`