can't convert cuda:0 device type tensor to numpy when using distributed gpu with large batch size

tesorrells commented 2 years ago

Traceback (most recent call last):
  File "train.py", line 537, in <module>
    train(hyp, opt, device, tb_writer, wandb)
  File "train.py", line 336, in train
    results, maps, times = test.test(opt.data,
  File "/home/lambda/SOUTHCOM/yolor-main/test.py", line 226, in test
    plot_images(img, output_to_target(output, width, height), paths, f, names)  # predictions
  File "/home/lambda/SOUTHCOM/yolor-main/utils/plots.py", line 108, in output_to_target
    return np.array(targets)
  File "/home/lambda/anaconda3/lib/python3.8/site-packages/torch/tensor.py", line 630, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
Traceback (most recent call last):
  File "/home/lambda/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/lambda/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/lambda/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/home/lambda/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main
    raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/home/lambda/anaconda3/bin/python', '-u', 'train.py', '--local_rank=1', '--batch-size', '16', '--img', '512', '512', '--data', 'coco.yaml', '--cfg', 'cfg/yolor_p6.cfg', '--weights', '', '--device', '0,1', '--sync-bn', '--name', 'yolor_p6', '--hyp', 'hyp.scratch.1280.yaml', '--epochs', '30']' returned non-zero exit status 1.

Not sure if anyone else has run into this, but when using the distributed gpu option and a batch size over 30, I get this.

cristian-corches commented 2 years ago

You could try adding this:

if isinstance(o, torch.Tensor):
    o = o.cpu().numpy()

into yolor/utils/plots.py at output_to_target

So the final function will be like this:

def output_to_target(output, width, height):
    # Convert model output to target format [batch_id, class_id, x, y, w, h, conf]
    if isinstance(output, torch.Tensor):
        output = output.cpu().numpy()

    targets = []
    for i, o in enumerate(output):
        if o is not None:
            if isinstance(o, torch.Tensor):
                o = o.cpu().numpy()
            for pred in o:
                box = pred[:4]
                w = (box[2] - box[0]) / width
                h = (box[3] - box[1]) / height
                x = box[0] / width + w / 2
                y = box[1] / height + h / 2
                conf = pred[4]
                cls = int(pred[5])

                targets.append([i, cls, x, y, w, h, conf])

    return np.array(targets)

xiaxialin commented 2 years ago

I found that this code has been corrected in the source code, but mine still reports this error.

aliencaocao commented 2 years ago

Look closely, there are 2 places that require the change, only 1 place is corrected. It is working for me on paper branch.

EmmanuelOBUjunior commented 2 years ago

@aliencaocao which other place requires this change

aliencaocao commented 2 years ago

I found it via stack trace, cannot remember exactly where now, but it should be obvious enough to see the stack trace

WongKinYiu / yolor

can't convert cuda:0 device type tensor to numpy when using distributed gpu with large batch size #112