HumanSignal / label-studio-converter

Tools for converting Label Studio annotations into common dataset formats
https://labelstud.io/
262 stars 130 forks source link

feat: LSDV-4831: Export BrushLabels to COCO #175

Closed cdpath closed 5 months ago

hogepodge commented 1 year ago

@cdpath I wanted to check in on this and see if you've had a chance to work on the patch update.

cdpath commented 1 year ago

Kinda busy at work recently. Will update soon.

makseq commented 1 year ago

@cdpath hi! do you have any updates?

makseq commented 1 year ago

@hogepodge please keep tracking this PR.

hogepodge commented 1 year ago

@cdpath we're trying to make the process for merging community feature requests easier. One thing that would help me a lot in moving this forward would be what we call "acceptance criteria." Essentially, when we hand this off to QA to determine if we can merge it, what is the expected behavior that we can test?

This is a much-requested feature, and we're very grateful for the patch. I want to help us move this along as best as I can.

cdpath commented 1 year ago

@hogepodge Sorry to be late. Just did a little update as a walk-around if pycocotools is not available.

codecov-commenter commented 1 year ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@fc5eb78). Click here to learn what that means. Patch has no changes to coverable lines.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #175 +/- ## ========================================= Coverage ? 45.93% ========================================= Files ? 21 Lines ? 1822 Branches ? 0 ========================================= Hits ? 837 Misses ? 985 Partials ? 0 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

makseq commented 1 year ago

@cdpath Our QA team tried to setup pycocotools on window and got this problem: image

What are your thoughts here? Any ideas? Maybe we can add pycocotools as options? (something like this pip install label-studio-converter[pycocotools])

cdpath commented 1 year ago

@cdpath Our QA team tried to setup pycocotools on window and got this problem: image

What are your thoughts here? Any ideas? Maybe we can add pycocotools as options? (something like this pip install label-studio-converter[pycocotools])

Yeah, that's an option. Another approach may be: create another fork of pycocotools, which includes wheels for Windows

makseq commented 1 year ago

Do you know how to make it as extra package in pip? We have no bandwidth to support forks of pycocotools.

kriap139 commented 1 year ago

Any updates on this feature?

cdpath commented 1 year ago

@makseq I've added an extra package, but am not certain whether I've correctly updated the _get_supported_formats

hogepodge commented 1 year ago

I've looked through the patch, and assuming that we've resolved the windows issue by making it an optional install, I'd like to move forward with merging this.

ODAncona commented 1 year ago

Hello,

I'm using the ml backend with SAM integration and I need to export to COCO with brushlabels, rectangleLabels and keypointLabels as well. So I used your PR code and did some little change. I only changed the files converter.py and brush.py

This is working well for me. There's still a problem when an annotation have multiple label for instance when the user label some stuff with brushLabels and with PolygonLabels in the same time...

Here's my code:

converter.py

...
   def convert_to_coco(
        self, input_data, output_dir, output_image_dir=None, is_dir=True
    ):
        def add_image(images, width, height, image_id, image_path):
            images.append(
                {
                    'width': width,
                    'height': height,
                    'id': image_id,
                    'file_name': image_path,
                }
            )
            return images

        self._check_format(Format.COCO)
        ensure_dir(output_dir)
        output_file = os.path.join(output_dir, 'result.json')
        if output_image_dir is not None:
            ensure_dir(output_image_dir)
        else:
            output_image_dir = os.path.join(output_dir, 'images')
            os.makedirs(output_image_dir, exist_ok=True)
        images, categories, annotations = [], [], []
        categories, category_name_to_id = self._get_labels()
        data_key = self._data_keys[0]
        item_iterator = (
            self.iter_from_dir(input_data)
            if is_dir
            else self.iter_from_json_file(input_data)
        )
        for item_idx, item in enumerate(item_iterator):
            image_path = item['input'][data_key]
            image_id = len(images)
            width = None
            height = None
            # download all images of the dataset, including the ones without annotations
            if not os.path.exists(image_path):
                try:
                    image_path = download(
                        image_path,
                        output_image_dir,
                        project_dir=self.project_dir,
                        return_relative_path=True,
                        upload_dir=self.upload_dir,
                        download_resources=self.download_resources,
                    )
                except:
                    logger.info(
                        'Unable to download {image_path}. The image of {item} will be skipped'.format(
                            image_path=image_path, item=item
                        ),
                        exc_info=True,
                    )
            # add image to final images list
            try:
                with Image.open(os.path.join(output_dir, image_path)) as img:
                    width, height = img.size
                images = add_image(images, width, height, image_id, image_path)
            except:
                logger.info(
                    "Unable to open {image_path}, can't extract width and height for COCO export".format(
                        image_path=image_path, item=item
                    ),
                    exc_info=True,
                )

            # skip tasks without annotations
            if not item['output']:
                # image wasn't load and there are no labels
                if not width:
                    images = add_image(images, width, height, image_id, image_path)

                logger.warning('No annotations found for item #' + str(item_idx))
                continue

            # concatenate results over all tag names
            labels = []
            for key in item['output']:
                labels += item['output'][key]

            if len(labels) == 0:
                logger.debug(f'Empty bboxes for {item["output"]}')
                continue

            for label in labels:
                category_name = None
                for key in [
                    'rectanglelabels',
                    'polygonlabels',
                    'brushlabels',
                    'keypointlabels',
                    'labels',
                ]:
                    if key in label and len(label[key]) > 0:
                        category_name = label[key][0]
                        break

                if category_name is None:
                    logger.warning("Unknown label type or labels are empty")
                    continue

                if not height or not width:
                    if 'original_width' not in label or 'original_height' not in label:
                        logger.debug(
                            f'original_width or original_height not found in {image_path}'
                        )
                        continue

                    width, height = label['original_width'], label['original_height']
                    images = add_image(images, width, height, image_id, image_path)

                category_id = category_name_to_id[category_name]

                annotation_id = len(annotations)

                if "polygonlabels" in label:
                    if "points" not in label:
                        logger.warn(label)
                    points_abs = [
                        (x / 100 * width, y / 100 * height) for x, y in label["points"]
                    ]
                    x, y = zip(*points_abs)

                    annotations.append(
                        {
                            'id': annotation_id,
                            'image_id': image_id,
                            'category_id': category_id,
                            'segmentation': [
                                [coord for point in points_abs for coord in point]
                            ],
                            'bbox': get_polygon_bounding_box(x, y),
                            'ignore': 0,
                            'iscrowd': 0,
                            'area': get_polygon_area(x, y),
                        }
                    )
                elif 'brushlabels' in label and brush.pycocotools_imported:
                    if "rle" not in label:
                        logger.warn(label)
                    coco_rle = brush.ls_rle_to_coco_rle(label["rle"], height, width)
                    segmentation = brush.ls_rle_to_polygon(label["rle"], height, width)
                    bbox = brush.get_cocomask_bounding_box(coco_rle)
                    area = brush.get_cocomask_area(coco_rle)
                    annotations.append(
                        {
                            "id": annotation_id,
                            "image_id": image_id,
                            "category_id": category_id,
                            "segmentation": segmentation,
                            "bbox": bbox,
                            'ignore': 0,
                            "iscrowd": 0,
                            "area": area,
                        }
                    )
                elif 'rectanglelabels' in label or 'keypointlabels' in label:
                    if "rle" not in label:
                        logger.warn(label)
                    coco_rle = brush.ls_rle_to_coco_rle(label["rle"], height, width)
                    segmentation = brush.ls_rle_to_polygon(label["rle"], height, width)
                    bbox = brush.get_cocomask_bounding_box(coco_rle)
                    area = brush.get_cocomask_area(coco_rle)
                    annotations.append(
                        {
                            'id': annotation_id,
                            'image_id': image_id,
                            'category_id': category_id,
                            'segmentation': segmentation,
                            'bbox': bbox,
                            'ignore': 0,
                            'iscrowd': 0,
                            'area': area,
                        }
                    )
                elif 'keypointlabels' in label:
                    if "rle" not in label:
                        logger.warn(label)
                    print(label["rle"])
                    coco_rle = brush.ls_rle_to_coco_rle(label["rle"], height, width)
                    segmentation = brush.ls_rle_to_polygon(label["rle"], height, width)
                    bbox = brush.get_cocomask_bounding_box(coco_rle)
                    area = brush.get_cocomask_area(coco_rle)
                    annotations.append(
                        {
                            'id': annotation_id,
                            'image_id': image_id,
                            'category_id': category_id,
                            'segmentation': segmentation,
                            'bbox': bbox,
                            'ignore': 0,
                            'iscrowd': 0,
                            'area': area,
                        }
                    )
                else:
                    raise ValueError("Unknown label type")

                if os.getenv('LABEL_STUDIO_FORCE_ANNOTATOR_EXPORT'):
                    annotations[-1].update({'annotator': get_annotator(item)})

        with io.open(output_file, mode='w', encoding='utf8') as fout:
            json.dump(
                {
                    'images': images,
                    'categories': categories,
                    'annotations': annotations,
                    'info': {
                        'year': datetime.now().year,
                        'version': '1.0',
                        'description': '',
                        'contributor': 'Label Studio',
                        'url': '',
                        'date_created': str(datetime.now()),
                    },
                },
                fout,
                indent=2,
            )
...

brush.py

...
def ls_rle_to_coco_rle(ls_rle, height, width):
    """from LS rle to compressed coco rle"""
    ls_mask = decode_rle(ls_rle)
    ls_mask = np.reshape(ls_mask, [height, width, 4])[:, :, 3]
    ls_mask = np.where(ls_mask > 0, 1, 0)
    binary_mask = np.asfortranarray(ls_mask)
    coco_rle = binary_mask_to_rle(binary_mask)
    result = pycocotools.mask.frPyObjects(coco_rle, *coco_rle.get('size'))
    result["counts"] = result["counts"].decode()
    return result

def ls_rle_to_polygon(ls_rle, height, width):
    """from LS rle to polygons"""
    ls_mask = decode_rle(ls_rle)
    ls_mask = np.reshape(ls_mask, [height, width, 4])[:, :, 3]
    ls_mask = np.where(ls_mask > 0, 1, 0)

    # Find contours from the binary mask
    contours = measure.find_contours(ls_mask, 0.5)
    segmentation = []

    for contour in contours:
        # Flip dimensions then ravel and cast to list
        contour = np.flip(contour, axis=1)
        contour = contour.ravel().tolist()
        segmentation.append(contour)
    return segmentation
...

There is still the issue when an annotation have multiple labels... There is no way to find them using the filter section.

Therefore I used:

Filter -> annotationResults contains {label}

to find problematic annotations...

makseq commented 1 year ago

@hogepodge let's try to take into account the last comment: https://github.com/heartexlabs/label-studio-converter/pull/175#issuecomment-1614720231

let's talk with @nehalecky on how we can add this changes and deliver this PR eventually.

Mat198 commented 6 months ago

Any updates on this feature?

makseq commented 5 months ago

After careful consideration, we’ve determined that this is more of an improvement than a critical bug. Additionally, it seems to be an outdated request and hasn’t garnered much interest from the community. For these reasons, we will be closing this issue. We will continue developing the converter library as a part of Label Studio SDK.

We appreciate your understanding and encourage you to submit your feedback, questions and suggestions here: https://github.com/HumanSignal/label-studio-sdk/issues