Closed mansooreh1 closed 6 months ago
I run on google colab
Hi @mansooreh1,
Based on the error log, it seems like port 1234
was not available for your device. You'll need find out what local ports are available for Google Colab.
Fred.
Hi, thanks for your reply. Does that mean your code can't be run on google colab?
I don't know how the communication ports on Google Colab are set up. But if you find an available port, I don't see why you couldn't run it on Google Colab.
Hello Thanks for your pretty code. When I run ! DETR=base python main.py --pretrained checkpoints/detr-r50-hicodet.pth \ --output-dir outputs/pvic-detr-r50-hicodet for training, I get the following error: /content/drive/MyDrive/pvic Namespace(backbone='resnet50', dilation=False, position_embedding='sine', hidden_dim=256, enc_layers=6, dec_layers=6, dim_feedforward=2048, dropout=0.1, nheads=8, num_queries=100, pre_norm=False, lr_head=0.0001, lr_drop=20, lr_drop_factor=0.2, epochs=30, batch_size=16, weight_decay=0.0001, clip_max_norm=0.1, aux_loss=True, set_cost_class=1, set_cost_bbox=5, set_cost_giou=2, bbox_loss_coef=5, giou_loss_coef=2, eos_coef=0.1, device='cuda', dataset='hicodet', partitions=['train2015', 'test2015'], num_workers=2, data_root='./hicodet', output_dir='outputs/pvic-detr-r50-hicodet', pretrained='checkpoints/detr-r50-hicodet.pth', print_interval=100, detector='base', raw_lambda=2.8, kv_src='C5', repr_dim=384, triplet_enc_layers=1, triplet_dec_layers=2, alpha=0.5, gamma=0.1, box_score_thresh=0.05, min_instances=3, max_instances=15, resume='', use_wandb=False, port='1234', seed=140, world_size=8, eval=False, cache=False, sanity=False) [W socket.cpp:697] [c10d] The client socket has failed to connect to [localhost]:1234 (errno: 99 - Cannot assign requested address). [W socket.cpp:697] [c10d] The client socket has failed to connect to [localhost]:1234 (errno: 99 - Cannot assign requested address). [W socket.cpp:697] [c10d] The client socket has failed to connect to [localhost]:1234 (errno: 99 - Cannot assign requested address). [W socket.cpp:697] [c10d] The client socket has failed to connect to [localhost]:1234 (errno: 99 - Cannot assign requested address). Traceback (most recent call last): File "/content/drive/MyDrive/pvic/main.py", line 193, in
mp.spawn(main, nprocs=args.world_size, args=(args,))
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 241, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
while not context.join():
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 158, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 2 terminated with the following error: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 68, in _wrap fn(i, *args) File "/content/drive/MyDrive/pvic/main.py", line 43, in main torch.cuda.set_device(rank) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/init.py", line 408, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.