torchrun --standalone --nproc_per_node=4 pretrain.py OR python -m torch.distributed.launch --nproc_per_node=1 pretrain.
py
[2024-03-07 17:54:29,710] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[2024-03-07 17:54:29,722] torch.distributed.run: [WARNING]
[2024-03-07 17:54:29,722] torch.distributed.run: [WARNING]
[2024-03-07 17:54:29,722] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system
being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-03-07 17:54:29,722] torch.distributed.run: [WARNING]
tokens per iteration will be: 32,768
breaks down as: 1 grad accum steps 4 processes 16 batch size * 512 max seq len
memmap:True train data.shape:(6936803, 512)
downloading finished.....
Initializing a new model from scratch
Traceback (most recent call last):
File "pretrain.py", line 239, in
torch.cuda.set_device(device)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\cuda__init.py", line 408, in set_device
Traceback (most recent call last):
File "pretrain.py", line 239, in
Traceback (most recent call last):
torch._C._cuda_setDevice(device)
File "pretrain.py", line 239, in
torch.cuda.set_device(device)RuntimeError:
CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\cuda\init__.py", line 408, in set_device
torch.cuda.set_device(device)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\cuda__init__.py", line 408, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
num decayed parameter tensors: 57, with 58,470,912 parameters
num non-decayed parameter tensors: 17, with 8,704 parameters
using fused AdamW: True
[2024-03-07 17:54:34,906] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2124 closing signal CTRL_C_EVENT
[2024-03-07 17:55:04,909] torch.distributed.elastic.agent.server.api: [WARNING] Received Signals.SIGINT death signal, shutting down workers
[2024-03-07 17:55:04,909] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2124 closing signal SIGINT
[2024-03-07 17:55:04,909] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2124 closing signal SIGTERM
Traceback (most recent call last):
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 727, in run
result = self._invoke_run(role)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 869, in _invoke_run
run_result = self._monitor_workers(self._worker_group)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\metrics\api.py", line 123, in wrapper
result = f(*args, **kwargs)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\agent\server\local_elastic_agent.py", line 329, in _monit
or_workers
result = self._pcontext.wait(0)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 277, in wait
return self._poll()
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 661, in _poll
self.close() # terminate all running procs
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 318, in close
self._close(death_sig=death_sig, timeout=timeout)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 706, in _close
handler.proc.wait(time_to_wait)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\subprocess.py", line 1079, in wait
return self._wait(timeout=timeout)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\subprocess.py", line 1357, in _wait
result = _winapi.WaitForSingleObject(self._handle,
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 62, in _terminate_process_h
andler
raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
torch.distributed.elastic.multiprocessing.api.SignalException: Process 1860 got signal: 2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\Scripts\torchrun.exe__main.py", line 7, in
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\errors__init__.py", line 347, in wrapper
return f(*args, **kwargs)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\run.py", line 812, in main
run(args)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\run.py", line 803, in run
elastic_launch(
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\launcher\api.py", line 135, in call__
return launch_agent(self._config, self._entrypoint, list(args))
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\launcher\api.py", line 259, in launch_agent
result = agent.run()
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\metrics\api.py", line 123, in wrapper
result = f(*args, **kwargs)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 734, in run
self._shutdown(e.sigval)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\agent\server\local_elastic_agent.py", line 311, in _shutd
own
self._pcontext.close(death_sig)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 318, in close
self._close(death_sig=death_sig, timeout=timeout)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 699, in _close
handler.close(death_sig=death_sig)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 582, in close
self.proc.send_signal(death_sig)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\subprocess.py", line 1434, in send_signal
raise ValueError("Unsupported signal: {}".format(sig))
ValueError: Unsupported signal: 2
torchrun --standalone --nproc_per_node=4 pretrain.py OR python -m torch.distributed.launch --nproc_per_node=1 pretrain. py [2024-03-07 17:54:29,710] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [2024-03-07 17:54:29,722] torch.distributed.run: [WARNING] [2024-03-07 17:54:29,722] torch.distributed.run: [WARNING] [2024-03-07 17:54:29,722] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. [2024-03-07 17:54:29,722] torch.distributed.run: [WARNING] tokens per iteration will be: 32,768 breaks down as: 1 grad accum steps 4 processes 16 batch size * 512 max seq len memmap:True train data.shape:(6936803, 512) downloading finished..... Initializing a new model from scratch Traceback (most recent call last): File "pretrain.py", line 239, in
torch.cuda.set_device(device)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\cuda__init.py", line 408, in set_device
Traceback (most recent call last):
File "pretrain.py", line 239, in
Traceback (most recent call last):
torch._C._cuda_setDevice(device)
File "pretrain.py", line 239, in
torch.cuda.set_device(device)RuntimeError:
CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with init__.py", line 408, in set_device
TORCH_USE_CUDA_DSA
to enable device-side assertions. File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\cuda\torch.cuda.set_device(device) File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\cuda__init__.py", line 408, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.num decayed parameter tensors: 57, with 58,470,912 parameters num non-decayed parameter tensors: 17, with 8,704 parameters using fused AdamW: True [2024-03-07 17:54:34,906] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2124 closing signal CTRL_C_EVENT [2024-03-07 17:55:04,909] torch.distributed.elastic.agent.server.api: [WARNING] Received Signals.SIGINT death signal, shutting down workers [2024-03-07 17:55:04,909] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2124 closing signal SIGINT [2024-03-07 17:55:04,909] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2124 closing signal SIGTERM Traceback (most recent call last): File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 727, in run result = self._invoke_run(role) File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 869, in _invoke_run run_result = self._monitor_workers(self._worker_group) File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\metrics\api.py", line 123, in wrapper result = f(*args, **kwargs) File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\agent\server\local_elastic_agent.py", line 329, in _monit or_workers result = self._pcontext.wait(0) File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 277, in wait return self._poll() File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 661, in _poll self.close() # terminate all running procs File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 318, in close self._close(death_sig=death_sig, timeout=timeout) File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 706, in _close handler.proc.wait(time_to_wait) File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\subprocess.py", line 1079, in wait return self._wait(timeout=timeout) File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\subprocess.py", line 1357, in _wait result = _winapi.WaitForSingleObject(self._handle, File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 62, in _terminate_process_h andler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 1860 got signal: 2
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\123\AppData\Local\Programs\Python\Python38\Scripts\torchrun.exe__main.py", line 7, in
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\errors__init__.py", line 347, in wrapper
return f(*args, **kwargs)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\run.py", line 812, in main
run(args)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\run.py", line 803, in run
elastic_launch(
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\launcher\api.py", line 135, in call__
return launch_agent(self._config, self._entrypoint, list(args))
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\launcher\api.py", line 259, in launch_agent
result = agent.run()
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\metrics\api.py", line 123, in wrapper
result = f(*args, **kwargs)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 734, in run
self._shutdown(e.sigval)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\agent\server\local_elastic_agent.py", line 311, in _shutd
own
self._pcontext.close(death_sig)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 318, in close
self._close(death_sig=death_sig, timeout=timeout)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 699, in _close
handler.close(death_sig=death_sig)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\distributed\elastic\multiprocessing\api.py", line 582, in close
self.proc.send_signal(death_sig)
File "C:\Users\123\AppData\Local\Programs\Python\Python38\lib\subprocess.py", line 1434, in send_signal
raise ValueError("Unsupported signal: {}".format(sig))
ValueError: Unsupported signal: 2