[2024-11-20 00:29:31,165][INFO] - workflow: [('train', 1)], max: 24 epochs
[2024-11-20 00:29:31,165][INFO] - Checkpoints will be saved to /dssg/home/acct-meyl/meyl-1/XAZ/projects/Occupancy/SparseOcc/outputs/SparseOcc/test_11_20 by HardDiskBackend.
[2024-11-20 00:29:43,765][INFO] - Epoch [1/24][1/3517] loss: 438.24, eta: 12 days, 7:02:32, time: 12.58s, data: 7520ms, mem: 20194M
Traceback (most recent call last):
File "train.py", line 181, in
main()
File "train.py", line 177, in main
runner.run([train_loader], [('train', 1)])
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
epoch_runner(data_loaders[i], **kwargs)
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 49, in train
for i, data_batch in enumerate(self.data_loader):
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in next
data = self._next_data()
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1356, in _next_data
return self._process_data(data)
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise
raise exception
TypeError: Caught TypeError in DataLoader worker process 2.
Original Traceback (most recent call last):
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/dssg/home/acct-meyl/meyl-1/XAZ/mmlabs/mmdet3d_1.0.0rc6/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 435, in getitem
data = self.prepare_train_data(idx)
File "/dssg/home/acct-meyl/meyl-1/XAZ/mmlabs/mmdet3d_1.0.0rc6/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 229, in prepare_train_data
example = self.pipeline(input_dict)
File "/dssg/home/acct-meyl/meyl-1/XAZ/mmlabs/mmdet3d_1.0.0rc6/mmdetection3d/mmdet3d/datasets/pipelines/compose.py", line 49, in call
data = t(data)
File "/dssg/home/acct-meyl/meyl-1/XAZ/projects/Occupancy/SparseOcc/loaders/pipelines/transforms.py", line 229, in call
img = Image.fromarray(np.uint8(results['img'][i]))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
[2024-11-20 00:29:46,034][INFO] - Epoch [1/24][2/3517] loss: 237.76, eta: 2 days, 5:14:18, time: 2.27s, data: 16ms, mem: 21039M
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2968278 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2968279 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2968281 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 2968280) of binary: /dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/bin/python
Traceback (most recent call last):
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==1.12.1', 'console_scripts', 'torchrun')())
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, **kwargs)
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
run(args)
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
[2024-11-20 00:29:31,165][INFO] - workflow: [('train', 1)], max: 24 epochs [2024-11-20 00:29:31,165][INFO] - Checkpoints will be saved to /dssg/home/acct-meyl/meyl-1/XAZ/projects/Occupancy/SparseOcc/outputs/SparseOcc/test_11_20 by HardDiskBackend. [2024-11-20 00:29:43,765][INFO] - Epoch [1/24][1/3517] loss: 438.24, eta: 12 days, 7:02:32, time: 12.58s, data: 7520ms, mem: 20194M Traceback (most recent call last): File "train.py", line 181, in
main()
File "train.py", line 177, in main
runner.run([train_loader], [('train', 1)])
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
epoch_runner(data_loaders[i], **kwargs)
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 49, in train
for i, data_batch in enumerate(self.data_loader):
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in next
data = self._next_data()
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1356, in _next_data
return self._process_data(data)
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise
raise exception
TypeError: Caught TypeError in DataLoader worker process 2.
Original Traceback (most recent call last):
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/dssg/home/acct-meyl/meyl-1/XAZ/mmlabs/mmdet3d_1.0.0rc6/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 435, in getitem
data = self.prepare_train_data(idx)
File "/dssg/home/acct-meyl/meyl-1/XAZ/mmlabs/mmdet3d_1.0.0rc6/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 229, in prepare_train_data
example = self.pipeline(input_dict)
File "/dssg/home/acct-meyl/meyl-1/XAZ/mmlabs/mmdet3d_1.0.0rc6/mmdetection3d/mmdet3d/datasets/pipelines/compose.py", line 49, in call
data = t(data)
File "/dssg/home/acct-meyl/meyl-1/XAZ/projects/Occupancy/SparseOcc/loaders/pipelines/transforms.py", line 229, in call
img = Image.fromarray(np.uint8(results['img'][i]))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
[2024-11-20 00:29:46,034][INFO] - Epoch [1/24][2/3517] loss: 237.76, eta: 2 days, 5:14:18, time: 2.27s, data: 16ms, mem: 21039M WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2968278 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2968279 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2968281 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 2968280) of binary: /dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/bin/python Traceback (most recent call last): File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==1.12.1', 'console_scripts', 'torchrun')())
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, **kwargs)
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
run(args)
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/dssg/home/acct-meyl/meyl-1/.conda/envs/mm3d/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
请问这个报错是怎么回事呢?