HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
19.33k stars 2.4k forks source link

Cannot Export Tasks In Selected Formats Using API #5192

Closed moss-xyz closed 5 months ago

moss-xyz commented 11 months ago

Describe the bug

I cannot export the tasks using the API endpoint for LabelStudio (/api/projects/{number}/exports), there are a variety of issues happening. I am specifically interested in exporting into the VOC format.

To Reproduce

Note that I am exporting this using the API endpoint as the typical "Export" workflow in the GUI hangs my computer (due to its very limited CPU/RAM), and I was hoping the API would be more reliable.

Step 1: Create Export Snapshot I used this endpoint to configure an export snapshot. This did seem to work successfully, but two issues cropped up.

For reference, the JSON I passed to the data field was the following:

data = {
    "title":"Test Full Export 2023-12-12 v2",
    "converted_formats":[{"export_type":"VOC"}],
    "task_filter_options":{"view":4, "annotated":"only", "skipped":"exclude", "finished":"only"}
}

The first issue was that, despite using the converted_formats field, the export snapshot did not list any actual formats in the response I received, which just said 'converted_formats': []

Second, despite setting the task_filter_options as listed above, the snapshot included all 4000+ images in my dataset, instead of the ~1000 annotated images.

Step 2: Converting Export Snapshot Since the snapshot did not have my desired export type, I used this endpoint to convert my snapshot from the default to VOC.

Here is the JSON I passed to the data field:

data = {"export_type":"VOC"}

This seemed to work! When I re-queried the endpoint, it did indeed say 'export_type': 'VOC'.

Step 3: Downloading the Export Finally, I used this endpoint to save the actual export.

This "works" in the sense that my GET request goes through, but "fails" in the sense that it doesn't provide the export in the format I requested (VOC), instead of the default JSON-formatted file, which I've attached here.

Expected behavior

I actually don't know what "format" the VOC is supposed to export in, but I presume that it is a zip file containing XML files and the images? If so, that didn't happen. Instead, I got a JSON-formatted file, which I've attached here (same as above).

Environment (please complete the following information):

Additional context

I was also requested to post my config XML, but I just remembered that I can't since I am away from the computer with Label-Studio loaded on to it, and will be for the rest of the month. I do remember that it was a slight modification of the Object Detection with Bounding Boxes template, just with different labels relevant to my project.

hogepodge commented 10 months ago

Hi @moss-xyz, this may be related to another issue. We're going to try and take a look into it along with the other.

AlexanderKozhevin commented 9 months ago

Confirmed, same issue with YOLO here is my config .../api/projects/55/exports

{
  "title": "My YOLO Export",
  "task_filter_options": {
    "finished": "only",
    "annotated": "only"
  },
  "annotation_filter_options": {
    "usual": true
  },
  "serialization_options": {
    "drafts": {
      "only_id": false
    }
  },
  "export_type": "YOLO"
}
makseq commented 5 months ago

Try using this SDK script:

pip install label-studio-sdk==0.0.34
import time
from label_studio_sdk import Client

class SnapshotExporter:
    def __init__(self, host, api_key):
        self.ls = Client(url=host, api_key=api_key)

    def export_json_snapshot(self, project_id):
        """ Export JSON snapshot """
        project = self.ls.get_project(project_id)
        export_result = project.export_snapshot_create(title='Export SDK Snapshot')
        export_id = export_result['id']

        # Wait until the snapshot is ready
        while project.export_snapshot_status(export_id).is_in_progress():
            time.sleep(1.0)

        return export_id

    def convert_snapshot(self, project_id, export_id, export_type):
        """ Convert JSON snapshot to specific format (YOLO, VOC, COCO, CSV, TSV, BRUSH_PNG, etc """
        response = self.ls.make_request(
            method='POST',
            url=f'/api/projects/{project_id}/exports/{export_id}/convert',
            json={'export_type': export_type}
        )
        return response.json()['converted_format']  # return conversion id

    def wait_for_conversion(self, project_id, export_id, conversion_id):
        """ Wait until the conversion is completed """
        project = self.ls.get_project(project_id)
        while True:
            exports = project.export_snapshot_list()
            for export in exports:
                if export['id'] == export_id:
                    for converted_format in export['converted_formats']:
                        if converted_format['id'] == conversion_id:
                            if converted_format['status'] == 'completed':
                                return
                            elif converted_format['status'] == 'failed':
                                raise Exception("Conversion failed")
            time.sleep(1.0)

    def download_snapshot(self, project_id, export_id, export_type):
        """ Download the converted snapshot """
        project = self.ls.get_project(project_id)
        status, file_name = project.export_snapshot_download(export_id, export_type=export_type)
        if status == 200:
            return file_name
        else:
            raise Exception("Failed to download the snapshot")

# Usage
host = LABEL_STUDIO_URL
api_key = API_KEY
project_id = PROJECT_ID
export_type = 'VOC'

exporter = SnapshotExporter(host, api_key)

# Step 1: Export JSON snapshot
export_id = exporter.export_json_snapshot(project_id)
print(f"Exported JSON snapshot with ID: {export_id}")

# Step 2: Convert JSON snapshot to format
conversion_id = exporter.convert_snapshot(project_id, export_id, export_type)
print(f"Started conversion to {export_type} with ID: {conversion_id}")

# Step 3: Wait for conversion to complete
exporter.wait_for_conversion(project_id, export_id, conversion_id)
print("Conversion completed")

# Step 4: Download the converted snapshot
file_name = exporter.download_snapshot(project_id, export_id, export_type=export_type)
print(f"Downloaded {export_type} snapshot as: {file_name}")

Explanation:

  1. Export JSON Snapshot: The export_json_snapshot method creates a new export snapshot in JSON format and waits until the export is completed.
  2. Convert to Pascal VOC (or other): The convert_snapshot method uses the make_request method to manually call the conversion API endpoint.
  3. Wait for Conversion: The wait_for_conversion method polls the export status until the conversion is completed or failed.
  4. Download the Snapshot: The download_snapshot method downloads the converted snapshot in Pascal VOC format (or other).

Make sure to replace host, api_key, and project_id with your actual Label Studio host, API key, and project ID.

biggeR-data commented 2 months ago

This is nice but fails to download the actual images. Is there a way to also get the images?