SysCV / sam-hq

Segment Anything in High Quality [NeurIPS 2023]
https://arxiv.org/abs/2306.01567
Apache License 2.0
3.66k stars 220 forks source link

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 13 for tensor number 1 in the list. #123

Closed stevezkw1998 closed 7 months ago

stevezkw1998 commented 7 months ago
Traceback (most recent call last):
PyTorch version: 2.0.1
CUDA is available: True
Loading model..
<All keys matched successfully>
Loading model.. Done
Start Inferencing..
type(input_boxes):
<class 'numpy.ndarray'>
  File "/root/inference.py", line 106, in <module>
input_boxes:
[[ 308  298  243  601]
 [  13  327  216  632]
 [ 557  144  187 1552]
 [ 174  286  142  459]
 [ 670  405   73  829]
 [ 785  194   88  160]
 [   0  327   52  437]
 [ 754  218   43  131]
 [ 347  261   40   63]
 [  33  251   71   96]
 [ 105  259   85  119]
 [   0  194   48  141]
 [ 181  202   77   92]]
    output_results = {
  File "/root/inference.py", line 107, in <dictcomp>
    path: model.inference(path, predictions[path])
  File "/root/inference.py", line 38, in inference
    masks, scores, logits = self.predictor.predict(
  File "/root/sam-hq/segment_anything/predictor.py", line 157, in predict
    masks, iou_predictions, low_res_masks = self.predict_torch(
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/sam-hq/segment_anything/predictor.py", line 227, in predict_torch
    sparse_embeddings, dense_embeddings = self.model.prompt_encoder(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/sam-hq/segment_anything/modeling/prompt_encoder.py", line 159, in forward
    sparse_embeddings = torch.cat([sparse_embeddings, box_embeddings], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 13 for tensor number 1 in the list.

My codes show here:

image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
w, h, _ = image.shape
bboxes = prediction["bboxes"]  # x1,y1,x2,y2
bboxes = np.asarray([b[:4] for b in bboxes], dtype=np.float32)
bboxes[:, (0, 2)] *= w
bboxes[:, (1, 3)] *= h
bboxes[:, 0:2] = np.ceil(bboxes[:, 0:2])
bboxes[:, 2:4] = np.floor(bboxes[:, 2:4])
bboxes = bboxes.astype(int)
input_boxes = bboxes
print("type(input_boxes):")
print(type(input_boxes))
print("input_boxes:")
print(input_boxes)
# input_boxes = np.array([[4,13,1007,1023]])
input_point, input_label = None, None
self.predictor.set_image(image)
masks, scores, logits = self.predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    box=input_boxes,
    multimask_output=False,
    hq_token_only=False,
)
stevezkw1998 commented 7 months ago

I should use these code: https://github.com/SysCV/sam-hq/blob/main/demo/demo_hqsam.py#L129