IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
https://arxiv.org/abs/2303.05499
Apache License 2.0
6.89k stars 697 forks source link

`device` arguments doesn't seem to affect anything & illegal memory access #221

Closed gjamesgoenawan closed 1 year ago

gjamesgoenawan commented 1 year ago

Hi! Thank you for your amazing work. I am doing zero-shot inference on custom dataset, however I came across a problem.

I've realized that you implemented a device argument in groundingdino.models.util.inference.load_model function. Below is the code snippet:

def load_model(model_config_path: str, model_checkpoint_path: str, device: str = "cuda"):
    args = SLConfig.fromfile(model_config_path)
    args.device = device
    model = build_model(args)
    checkpoint = torch.load(model_checkpoint_path, map_location=device)
    model.load_state_dict(clean_state_dict(checkpoint["model"]), strict=False)
    model.eval()
    return model 

I understand that the device is passed to the build_model function via args. However I cannot find any reference that indicated the loading into the device.

I did some experiment where i created 2 instances of the model, one with .to('cuda:0'). I infered 1 image and recorded the time taken. Here's a pseudo code that I used:

device = torch.device("cuda:0")
model = load_model(/path/to/config, /path/to/checkpoint, device = device)
model_cuda = load_model(/path/to/config, /path/to/checkpoint, device = device).to(device)

# without to(cuda)
for i in range(0,10): # warmup
    _ = model(images, captions=caption)
t0 = time.time()
_ = model(images, captions=caption)
t1 = time.time()

# with to(cuda)
for i in range(0,10): # warmup
    _ = model_cuda(images.to(cuda), captions=caption)
t2 = time.time()
_ = model_cuda(images, captions=caption)
t3 = time.time()

The time taken for model to finsih is 3.7 sec, while model_cuda only took 0.4.=

Furthermore the following codes also confirmed that none of the model parameters are being transfered to cuda:

print([i for i in model.parameters() if i.device.type == "cpu"])

Additionally, I've tried moving it to my second GPU by setting device = torch.device("cuda:1") However, I encountered more error, this time stating that illegal memory has been accessed. Here's the traceback:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[5], line 1
----> 1 outputs = model_cuda(images.to("cuda:1"), captions=captions)

File ~/miniconda3/envs/gdino/lib/python3.11/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~GroundingDINO/groundingdino/models/GroundingDINO/groundingdino.py:330, in GroundingDINO.forward(self, samples, targets, **kw)
    327         self.poss.append(pos_l)
    329 input_query_bbox = input_query_label = attn_mask = dn_meta = None
--> 330 hs, reference, hs_enc, ref_enc, init_box_proposal = self.transformer(
    331     srcs, masks, input_query_bbox, self.poss, input_query_label, attn_mask, text_dict
    332 )
    334 # deformable-detr-like anchor update
    335 outputs_coord_list = []

File ~/miniconda3/envs/gdino/lib/python3.11/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~GroundingDINO/groundingdino/models/GroundingDINO/transformer.py:363, in Transformer.forward(self, srcs, masks, refpoint_embed, pos_embeds, tgt, attn_mask, text_dict)
    353     raise NotImplementedError("unknown two_stage_type {}".format(self.two_stage_type))
    354 #########################################################
    355 # End preparing tgt
    356 # - tgt: bs, NQ, d_model
   (...)
    361 # Begin Decoder
    362 #########################################################
--> 363 hs, references = self.decoder(
    364     tgt=tgt.transpose(0, 1),
    365     memory=memory.transpose(0, 1),
    366     memory_key_padding_mask=mask_flatten,
    367     pos=lvl_pos_embed_flatten.transpose(0, 1),
    368     refpoints_unsigmoid=refpoint_embed.transpose(0, 1),
    369     level_start_index=level_start_index,
    370     spatial_shapes=spatial_shapes,
    371     valid_ratios=valid_ratios,
    372     tgt_mask=attn_mask,
    373     memory_text=text_dict["encoded_text"],
    374     text_attention_mask=~text_dict["text_token_mask"],
    375     # we ~ the mask . False means use the token; True means pad the token
    376 )
    377 #########################################################
    378 # End Decoder
    379 # hs: n_dec, bs, nq, d_model
   (...)
    384 # Begin postprocess
    385 #########################################################
    386 if self.two_stage_type == "standard":

File ~/miniconda3/envs/gdino/lib/python3.11/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~GroundingDINO/groundingdino/models/GroundingDINO/transformer.py:703, in TransformerDecoder.forward(self, tgt, memory, tgt_mask, memory_mask, tgt_key_padding_mask, memory_key_padding_mask, pos, refpoints_unsigmoid, level_start_index, spatial_shapes, valid_ratios, memory_text, text_attention_mask)
    682 # if os.environ.get("SHILONG_AMP_INFNAN_DEBUG") == '1':
    683 #     if query_pos.isnan().any() | query_pos.isinf().any():
    684 #         import ipdb; ipdb.set_trace()
    685 
    686 # main process
    687 output = layer(
    688     tgt=output,
    689     tgt_query_pos=query_pos,
   (...)
    701     cross_attn_mask=memory_mask,
    702 )
--> 703 if output.isnan().any() | output.isinf().any():
    704     print(f"output layer_id {layer_id} is nan")
    705     try:

RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

So am I missing something, or is the model is running on CPU despite me passing the device arguments? Furthermore, have you encountered this on a multi-GPU setup?

SlongLiu commented 1 year ago

Thanks for the question.

I will use the environment vars: CUDA_VISIBLE_DEVICES=1 if I want to specify a GPU, rather than the "cuda:id".

tingxueronghua commented 8 months ago

Seems there are more problems related to the GPU assignment.

gjamesgoenawan commented 8 months ago

@SlongLiu Thank you for your response! @tingxueronghua I ended up using torch.cuda.set_device(rank) to assign a model instance to the appropriate GPU.