autodistill / autodistill-grounding-dino

Grounding DINO module for use with Autodistill.
https://docs.autodistill.com
Apache License 2.0
18 stars 13 forks source link

Format of labels is xyxy #5

Open Mars-204 opened 10 months ago

Mars-204 commented 10 months ago

Hi,

I am using autodistill-grounding-dino to annotate imgaes for yolov8 training. Yolov8 model expects the labels in format (class x_center y_center width height).

I found that the boxes are converted to xyxy format in post process. Does this affect the final labels generated?

def post_process_result( source_h: int, source_w: int, boxes: torch.Tensor, logits: torch.Tensor ) -> sv.Detections: boxes = boxes * torch.Tensor([source_w, source_h, source_w, source_h]) xyxy = box_convert(boxes=boxes, in_fmt="cxcywh", out_fmt="xyxy").numpy() confidence = logits.numpy() return sv.Detections(xyxy=xyxy, confidence=confidence)

capjamesg commented 10 months ago

Hello! How are you saving the labels? Are you using the .label() function in Autodistill, or writing your own logic?

capjamesg commented 10 months ago

Reference: https://docs.autodistill.com/reference/base-models/detection/#autodistill.detection.detection_base_model.DetectionBaseModel.label

Mars-204 commented 10 months ago

I am using .label() function in Autodistill

capjamesg commented 9 months ago

Is the format returned by autodistill-grounding-dino the same or it is (xyxy width height)?

It should be converted to YOLOv8.

Are the labels normalized? I get an error of out of bound labels while training for yolov8

Base models return pixel coordinates rather than normalized values from 0-1.

Can you share your code so I can replicate your issue?

Mars-204 commented 9 months ago

I have checked the code and it returns the xyxy width height format. For converting to YOLOv8 format (x_center y_center width height) I have to modify the code.

-- Background: I am trying to label 'persons' from intensity images of .pgm format. I modified the source code to handle the .pgm images

-- Main function to annotate folder data

def annotator(root_folder, save_dir, sam=False, dino=True):
  if sam:
    base_model_sam = GroundedSAM(ontology=CaptionOntology({"all person": "person"}))
    base_model_sam.label(
        # input_folder=r"C:\work\masterarbiet\3d-object-detection-and-tracking-using-dl\data\data_collection\manthan-test",
        input_folder="./images",
        output_folder="./dataset_sam",
        extension=".pgm"
      )

  folder_name = root_folder /  str(root_folder.name + "_intensity")  
  os.makedirs(folder_name, exist_ok=True)
  intenisty_images = list(root_folder.glob("*inten.pgm"))

  for im in intenisty_images:
    shutil.copy(im, folder_name)

  if dino:
    base_model_dino = GroundingDINO(ontology=CaptionOntology({"all person": "person"}), box_threshold=0.25)
    # label all images in a folder called "context_images"

    base_model_dino.label(input_folder=str(folder_name),
        output_folder=str(save_dir),
        extension=".pgm")

-- Changes to handle the .pgm images

 def predict(self, input: str) -> sv.Detections:
        # image = load_image(input, return_format="cv2")
        image_source, image = load_image(input)  # load_input() method from groundingdino.util.inference
capjamesg commented 9 months ago

It is a bit hard to interpret your message. Can you use the backtick character to format your code (`)? If you are proposing a complete solution, feel free to submit it as a PR to the package and I'll review!