Visual-Behavior / detr-tensorflow

Tensorflow implementation of DETR : Object Detection with Transformers
MIT License
168 stars 53 forks source link

We expect only one image here for now ... #22

Open myopengit opened 3 years ago

myopengit commented 3 years ago

Does the dataset only support batch size 1, any plan to fix this?

def retrieve_outputs(augmented_images, augmented_bbox):

outputs_dict = {}
image_shape = None

# We expect only one image here for now
image = augmented_images[0].astype(np.float32)
augmented_bbox = augmented_bbox[0]

bbox, t_class = imgaug_bbox_to_xcyc_wh(augmented_bbox, image.shape[0], image.shape[1])

bbox = np.array(bbox)
t_class = np.array(t_class)

return image, bbox, t_class
thibo73800 commented 3 years ago

This comment is kind of misleading. Internally we're working with sequential image data (batch, sequence_size, h, w, 3). This comment just means that the sequence is expected to be one, but it does not make sense here because this repository is not meant to handle sequences.

So the dataset support batch size > 1

myopengit commented 3 years ago

Thank you for the explanation. The code is very well written. How much accuracy can you get using train_coco.py ?

I have tested your pre-trained model and it reaches over 42map. But according to the statement, the default model is not trained using the provided code.