IDEA-Research / DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
Apache License 2.0
2.1k stars 230 forks source link

How to solve the visualization problem to get a better understand of the overall model architecture? #205

Closed andyoung009 closed 1 year ago

andyoung009 commented 1 year ago

It is a nice work. I also try to visualize the model architecture with tensorboard. But when i pass the image and 'TEXT_PROMPT' after the preprocess_caption function, the error comes "RuntimeError: Type 'Tuple[Tensor, List[str]]' cannot be traced. The code block is as follows: `import os

def preprocess_caption(caption: str) -> str: result = caption.lower().strip() if result.endswith("."): return result return result + "."

CONFIG_PATH = os.path.join(HOME, "groundingdino/config/GroundingDINO_SwinT_OGC.py") print(CONFIG_PATH, "; exist:", os.path.isfile(CONFIG_PATH))

WEIGHTS_NAME = "groundingdino_swint_ogc.pth" WEIGHTS_PATH = os.path.join(HOME, "weights", WEIGHTS_NAME) print(WEIGHTS_PATH, "; exist:", os.path.isfile(WEIGHTS_PATH))

IMAGE_NAME = "dog-3.jpeg" IMAGE_PATH = os.path.join(HOME, "data", IMAGE_NAME)

TEXT_PROMPT = "chair" TEXT_PROMPT = preprocess_caption(TEXT_PROMPT) BOX_TRESHOLD = 0.35 TEXT_TRESHOLD = 0.25

model = load_model(CONFIG_PATH, WEIGHTS_PATH)

image_source, image = load_image(IMAGE_PATH) with torch.no_grad(): outputs = model(image[None], captions=[TEXT_PROMPT])

log_dir = "./logs" writer = SummaryWriter(log_dir)

writer.add_graph(model, input_to_model={"samples": image[None], "captions": TEXT_PROMPT})

writer.add_graph(model,(image[None], [TEXT_PROMPT])) writer.close()`

GrDINo

error comes "Only Tensors and (possibly nested) Lists, Dicts, and Tuples of Tensors can be traced". GrDINO_issue Because the text and image backbone deal corresponding data in the beginning of Grounding DINO, How can i pass the input to the add_graph function? I also tried the netron.app with the .pth file, but it didn't have the link between the layers. Could you please give some advise to solve the visualization problem? Thanks!