facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.09k stars 2.37k forks source link

Is padding being done to convert all image tensors to same size before passing through the model? #525

Closed D10752002 closed 1 year ago

D10752002 commented 1 year ago

❓ How to do something using DETR

I'm trying to train on my custom dataset, and when I print the dimensions of the nested_tensor before passing through the backbone, it seems like padding is not being done, I'm receiving some random height and width of the tensors (maybe because of transforms like random_resize). Is the code intentionally written this way?

I have printed out tensor_list.tensors.size() for dimensions before passing the nested tensor to the resnet(backbone) and also printed out src.size() for dimensions of the tensor after resnet and before passing to the transformer.

batch size is 2, the input channels is 3, output channels after backbone resnet-50 is 2048, as expected.

The output is as follows image

Were these random dimensions of the tensor intentional?