Closed Smu-Tan closed 1 year ago
I have no idea. Did you find out which rank was stuck?
I have no idea. Did you find out which rank was stuck?
I used two cards and both of them were stuck. Below are some warnings.
master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:30123 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:30123 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/synchronize.py:15: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
nccl.allReduce(barrier.storage(), barrier.storage(), 'sum', config['comm'])
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/synchronize.py:15: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
nccl.allReduce(barrier.storage(), barrier.storage(), 'sum', config['comm'])
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:109: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
assert src.is_cuda and dst.is_cuda
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:111: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
sendbuff = src.data_ptr()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:112: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
recvbuff = dst.data_ptr()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:113: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
count = src.size()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:117: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
assert src.size() == dst.size(), "Buffer size not aligned"
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:109: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
assert src.is_cuda and dst.is_cuda
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:111: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
sendbuff = src.data_ptr()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:112: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
recvbuff = dst.data_ptr()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:113: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
count = src.size()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:117: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
assert src.size() == dst.size(), "Buffer size not aligned"
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/parameter.py:46: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
cuda_storage = cuda_tensor.storage_type()(cuda_storage_size)
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/parameter.py:46: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
cuda_storage = cuda_tensor.storage_type()(cuda_storage_size)
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/torch/storage.py:959: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
if self.device.type not in ['cpu', 'cuda']:
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/torch/storage.py:962: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
module = torch if self.device.type == 'cpu' else torch.cuda
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/torch/storage.py:959: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
if self.device.type not in ['cpu', 'cuda']:
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/torch/storage.py:962: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
module = torch if self.device.type == 'cpu' else torch.cuda
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:333: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
storage_type = storage_type_cuda(param.storage_type())
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:333: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
storage_type = storage_type_cuda(param.storage_type())
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:364: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
storage_param_buffer = storage_type(partition_size)
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:367: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
device = storage_param_buffer.device
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:364: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
storage_param_buffer = storage_type(partition_size)
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:367: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
device = storage_param_buffer.device
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/parameter.py:95: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
partition_size = value.storage().size()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/parameter.py:98: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
storage = value.storage_type()(global_size)
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/parameter.py:95: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
partition_size = value.storage().size()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/parameter.py:101: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
value.storage(),
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:218: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
assert src.is_cuda and dst.is_cuda
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/parameter.py:98: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
storage = value.storage_type()(global_size)
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:220: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
sendbuff = src.data_ptr()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:221: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
recvbuff = dst.data_ptr()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:222: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
sendcount = src.size()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:225: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
assert dst.size() % sendcount == 0, "Buffer size not aligned"
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/parameter.py:101: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
value.storage(),
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:218: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
assert src.is_cuda and dst.is_cuda
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:220: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
sendbuff = src.data_ptr()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:221: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
recvbuff = dst.data_ptr()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:222: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
sendcount = src.size()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:152: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
recvbuff = dst.data_ptr()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:153: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
count = src.size()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/nccl/__init__.py:156: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
assert dst.size() == src.size(), "Buffer size not aligned"
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/store.py:88: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
byte_tensor.storage(),
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/store.py:89: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
byte_tensor.storage(),
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/store.py:104: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
byte_tensor.storage(),
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/store.py:105: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
byte_tensor.storage(),
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/store.py:133: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
tmp_shape.storage(),
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/store.py:134: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
tmp_shape.storage(),
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/store.py:133: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
tmp_shape.storage(),
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/store.py:134: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
tmp_shape.storage(),
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/store.py:160: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
output_param.storage(),
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/store.py:161: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
output_param.storage(),
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/store.py:153: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
input_param.storage(),
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/store.py:154: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
output_param.storage(),
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:508: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
torch.tensor([], dtype=d_dtype, device=d_device).set_(contiguous_param.storage(), offset_st, (offset_end - offset_st,))[:]
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:507: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
torch.tensor([], dtype=d_dtype, device=d_device).set_(self._storage_params[kw_name].storage(), to_offset_st, (to_offset_end - to_offset_st,))[:] = \
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:508: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
torch.tensor([], dtype=d_dtype, device=d_device).set_(contiguous_param.storage(), offset_st, (offset_end - offset_st,))[:]
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:507: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
torch.tensor([], dtype=d_dtype, device=d_device).set_(self._storage_params[kw_name].storage(), to_offset_st, (to_offset_end - to_offset_st,))[:] = \
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:151: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
storage_type = local_param.storage_type()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:153: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
self._param_buffer[kw] = storage_type(val["partition_size"] * config["world_size"])
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:151: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
storage_type = local_param.storage_type()
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:153: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
self._param_buffer[kw] = storage_type(val["partition_size"] * config["world_size"])
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:154: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
self._param_tensor[kw] = torch.tensor([], dtype=self._param_buffer[kw].dtype, device=self._param_buffer[kw].device).set_(self._param_buffer[kw])
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:154: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
self._param_tensor[kw] = torch.tensor([], dtype=self._param_buffer[kw].dtype, device=self._param_buffer[kw].device).set_(self._param_buffer[kw])
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:163: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
self.block._storage_params[kw].storage(),
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:163: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
self.block._storage_params[kw].storage(),
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:187: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
device = self._param_buffer[kw_name].device
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:187: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
device = self._param_buffer[kw_name].device
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:257: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
param["parameter"].data = torch.tensor([], dtype=dtype, device=device).set_(self.block._storage_params[kw_name].storage(), begin, end)
/home/stan1/anaconda3/envs/pruning/lib/python3.9/site-packages/bmtrain/block_layer.py:257: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
param["parameter"].data = torch.tensor([], dtype=dtype, device=device).set_(self.block._storage_params[kw_name].storage(), begin, end)
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:30123 (errno: 98 - Address already in use).
[W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:30123 (errno: 98 - Address already in use).
[E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
It looks like torchrun didn't start successfully.
[W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:30123 (errno: 98 - Address already in use). [W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:30123 (errno: 98 - Address already in use). [E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.
It looks like torchrun didn't start successfully.
Could be. However, I can run Torch DDP stuff based on dist.init_process_group()
with similar errors (see below) but successfully. Here's a toy example.
/var/spool/slurm/job564246/slurm_script: line 15: conda: command not found
[W socket.cpp:426] [c10d] The server socket cannot be initialized on [::]:3456 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [ilps-cn115.ivi_ilps.local]:3456 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [ilps-cn115.ivi_ilps.local]:3456 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [ilps-cn115-d.ivi_ilps.data]:3456 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [ilps-cn115-d.ivi_ilps.data]:3456 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [::ffff:192.168.70.215]:3456 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [::ffff:192.168.70.215]:3456 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [ilps-cn115.ivi_ilps.local]:3456 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [ilps-cn115-d.ivi_ilps.data]:3456 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [::ffff:192.168.70.215]:3456 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [ilps-cn115.ivi_ilps.local]:3456 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [ilps-cn115-d.ivi_ilps.data]:3456 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [::ffff:192.168.70.215]:3456 (errno: 97 - Address family not supported by protocol).
Hi,
When using BMCOOK with BMTrain I encountered a bug that the second bmtrain.synchronize() is always stuck. Do you probably have any ideas?
Below is the code: