Set torch.multiprocessing start method as 'spawn'. Otherwise the following error would be raised.
Megatron-LM/megatron/core/extensions/transformer_engine.py", line 957, in get_cpu_offload_context
context, sync_func = _get_cpu_offload_context(
File "/opt/conda/lib/python3.8/site-packages/transformer_engine/pytorch/cpu_offload.py", line 502, in get_cpu_offload_context
cpu_offload_handler = AsyncDoubleBufferGroupOffloadHandler(
File "/opt/conda/lib/python3.8/site-packages/transformer_engine/pytorch/cpu_offload.py", line 312, in __init__
self.d2h_stream = torch.cuda.Stream()
File "/opt/conda/lib/python3.8/site-packages/torch/cuda/streams.py", line 35, in __new__
return super().__new__(cls, priority=priority, **kwargs)
RuntimeError: CUDA error: initialization error
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Set
torch.multiprocessing
start method as 'spawn'. Otherwise the following error would be raised.