facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.09k stars 2.37k forks source link

Issue on DETR colab example with a picture from coco 2017 val5k #546

Open JeanPhilippeMonteuuis opened 1 year ago

JeanPhilippeMonteuuis commented 1 year ago

If you do not know the root cause of the problem, and wish someone to help you, please post according to this template:

I have replaced the picture used as an example with another picture from the COCO 2017 val5k. I was unable to complete two of the DETR colab advertised on this repo.

Instructions To Reproduce the Issue:

  1. what changes you made (git diff) or what code you wrote
    
    #url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
    url = 'http://images.cocodataset.org/val2017/000000007888.jpg'
    im = Image.open(requests.get(url, stream=True).raw)
    scores, boxes = detect(im, detr, transform)
2. what exact command you run:

I tested this picture on two colabs advertised in this github repo:
- [Standalone DETR](https://colab.research.google.com/github/facebookresearch/detr/blob/colab/notebooks/detr_demo.ipynb)
- [Hands-on DETR](https://colab.research.google.com/github/facebookresearch/detr/blob/colab/notebooks/detr_attention.ipynb)

4. what you observed (including __full logs__):

RuntimeError Traceback (most recent call last) in 1 # mean-std normalize the input image (batch-size: 1) ----> 2 img = transform(im).unsqueeze(0) 3 4 # propagate through the model 5 outputs = model(img)

4 frames /usr/local/lib/python3.7/dist-packages/torchvision/transforms/transforms.py in call(self, img) 92 def call(self, img): 93 for t in self.transforms: ---> 94 img = t(img) 95 return img 96

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, *kwargs) 1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1129 or _global_forward_hooks or _global_forward_pre_hooks): -> 1130 return forward_call(input, **kwargs) 1131 # Do not call functions when jit is used 1132 full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/torchvision/transforms/transforms.py in forward(self, tensor) 267 Tensor: Normalized Tensor image. 268 """ --> 269 return F.normalize(tensor, self.mean, self.std, self.inplace) 270 271 def repr(self) -> str:

/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional.py in normalize(tensor, mean, std, inplace) 358 raise TypeError(f"img should be Tensor Image. Got {type(tensor)}") 359 --> 360 return F_t.normalize(tensor, mean=mean, std=std, inplace=inplace) 361 362

/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional_tensor.py in normalize(tensor, mean, std, inplace) 957 if std.ndim == 1: 958 std = std.view(-1, 1, 1) --> 959 tensor.sub(mean).div(std) 960 return tensor 961

RuntimeError: output with shape [1, 802, 800] doesn't match the broadcast shape [3, 802, 800]


5. please simplify the steps as much as possible so they do not require additional resources to
     run, such as a private dataset.

I tested this picture on two colabs advertised in this github repo:
- [Standalone DETR](https://colab.research.google.com/github/facebookresearch/detr/blob/colab/notebooks/detr_demo.ipynb)
- [Hands-on DETR](https://colab.research.google.com/github/facebookresearch/detr/blob/colab/notebooks/detr_attention.ipynb)
- I have used this [picture](http://images.cocodataset.org/val2017/000000007888.jpg)

## Expected behavior:

I was expecting to see the bounding boxes and their respective label in the picture.

## Environment:

I tested this picture on two colabs advertised in this github repo:
- [Standalone DETR](https://colab.research.google.com/github/facebookresearch/detr/blob/colab/notebooks/detr_demo.ipynb)
- [Hands-on DETR](https://colab.research.google.com/github/facebookresearch/detr/blob/colab/notebooks/detr_attention.ipynb)
- I have used this [picture](http://images.cocodataset.org/val2017/000000007888.jpg)
Rouhi-Amirreza commented 1 year ago

Hi.. In order to solve this problem you have to convert 1-channel to 3-channel Images. You can simply use try... exception like this:

except  ValueError:
        img = cv2.imread(file)
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        img2 = np.zeros_like(img)
        img2[:,:,0] = gray
        img2[:,:,1] = gray
        img2[:,:,2] = gray
        cv2.imwrite('xx.jpg', img2)
        image = Image.open('xx.jpg')

Now you can use "image" as input to your model.