Maelic / SGG-Benchmark

A New Benchmark for Scene Graph Generation, targeting real-world applications
MIT License
36 stars 5 forks source link

Error during training validation: AttributeError: 'NoneType' object has no attribute 'item' #36

Open Young-Loser opened 15 hours ago

Young-Loser commented 15 hours ago

Dear author, thank you very much for your excellent work on this project. When I train my own SGDet model, I encounter two errors during the validation phase. No.1 is as follows:

Traceback (most recent call last): File "/Dpan/wyc/realtime_rwsg/SGG-Benchmark/tools/relation_train_net.py", line 994, in <module> main() File "/Dpan/wyc/realtime_rwsg/SGG-Benchmark/tools/relation_train_net.py", line 973, in main model, best_checkpoint = train( ^^^^^^ File "/Dpan/wyc/realtime_rwsg/SGG-Benchmark/tools/relation_train_net.py", line 704, in train run_val(cfg, model, val_data_loaders, args['distributed'], logger, device=device) File "/Dpan/wyc/realtime_rwsg/SGG-Benchmark/tools/relation_train_net.py", line 843, in run_val if len(dataset_result) == 1: ^^^^^^^^^^^^^^^^^^Traceback (most recent call last): ^ File "/Dpan/wyc/realtime_rwsg/SGG-Benchmark/tools/relation_train_net.py", line 994, in <module> TypeError: object of type 'float' has no len() main() File "/Dpan/wyc/realtime_rwsg/SGG-Benchmark/tools/relation_train_net.py", line 973, in main model, best_checkpoint = train( ^^^^^^ File "/Dpan/wyc/realtime_rwsg/SGG-Benchmark/tools/relation_train_net.py", line 704, in train run_val(cfg, model, val_data_loaders, args['distributed'], logger, device=device) File "/Dpan/wyc/realtime_rwsg/SGG-Benchmark/tools/relation_train_net.py", line 848, in run_val dataset_result[k1][k2] = torch.distributed.all_reduce(torch.tensor(np.mean(v2)).to(device).unsqueeze(0)).item() / torch.distributed.get_world_size() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'item'

No.2 is as follows: Traceback (most recent call last): File "/Dpan/wyc/realtime_rwsg/SGG-Benchmark/tools/relation_train_net.py", line 1514, in <module> main() File "/Dpan/wyc/realtime_rwsg/SGG-Benchmark/tools/relation_train_net.py", line 1493, in main model, best_checkpoint = train( ^^^^^^ File "/Dpan/wyc/realtime_rwsg/SGG-Benchmark/tools/relation_train_net.py", line 1253, in train val_result = run_val(cfg, model, val_data_loaders, args['distributed'], logger) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Dpan/wyc/realtime_rwsg/SGG-Benchmark/tools/relation_train_net.py", line 1363, in run_val if len(dataset_result) == 1: ^^^^^^^^^^^^^^^^^^^ TypeError: object of type 'float' has no len() Traceback (most recent call last): File "/Dpan/wyc/realtime_rwsg/SGG-Benchmark/tools/relation_train_net.py", line 1514, in <module> main() File "/Dpan/wyc/realtime_rwsg/SGG-Benchmark/tools/relation_train_net.py", line 1493, in main model, best_checkpoint = train( ^^^^^^ File "/Dpan/wyc/realtime_rwsg/SGG-Benchmark/tools/relation_train_net.py", line 1253, in train val_result = run_val(cfg, model, val_data_loaders, args['distributed'], logger) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Dpan/wyc/realtime_rwsg/SGG-Benchmark/tools/relation_train_net.py", line 1368, in run_val dataset_result[k1][k2] = torch.distributed.all_reduce(torch.tensor(np.mean(v2)).to(device).unsqueeze(0)).item() / torch.distributed.get_world_size() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Dpan/wyc/anaconda3/envs/rtrw_sg/lib/python3.11/site-packages/torch/distributed/c10d_logger.py", line 72, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Dpan/wyc/anaconda3/envs/rtrw_sg/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 1992, in all_reduce work = group.allreduce([tensor], opts) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: No backend type associated with device type cpu

Could you tell me how to solve them???Thank you very much!!!!!!!!

Maelic commented 9 hours ago

Yes there may be some issues if you want to train with multiple gpus. I will investigate that another day, in the meantime try to run the training on a single gpu, it should work.