HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
19.55k stars 2.42k forks source link

Not getting images in export via the python sdk #3868

Open JeremyMahieu opened 1 year ago

JeremyMahieu commented 1 year ago

Describe the bug When downloading a snapshot via the python sdk, the retrieved zip file has no images. Tried 'YOLO' and 'COCO' formats. Via the web interface it does contain images. The image folder is there but empty

To Reproduce Steps to reproduce the behavior:

  1. Create a project with some images, annotate them
  2. Get the zip file via the following python script
    from label_studio_sdk import Client, Project
    LABEL_STUDIO_URL = 'https://asdf.com/'
    API_KEY = 'asdfasdfasdf'
    exportid = project.export_snapshot_create('test')['id']
    (status, filename) = project.export_snapshot_download(exportid, 'YOLO')

Expected behavior Have the images in the zip file together will the annotation stuff.

Environment (please complete the following information):

JeremyMahieu commented 1 year ago

This is a workaround for anyone doing this:

import requests
import datetime

url = LABEL_STUDIO_URL + 'api/projects/1/export?exportType=YOLO'
timestamp = datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
filename = f"{timestamp}.zip"
headers = { 'Authorization': 'Token ' + API_KEY }
response = requests.get(url, headers=headers)
if response.status_code == 200:
    with open(filename, "wb") as f:
        f.write(response.content)
    print(f"File downloaded as {filename}")
else:
    print("Failed to download file")
mmoollllee commented 1 year ago

Problem still exists. Workaround won't work for me as I need to use filters of current view for the export. See: https://github.com/HumanSignal/label-studio-sdk/blob/master/examples/export_snapshots.py

dceluis commented 1 year ago

Related to https://github.com/HumanSignal/label-studio-sdk/issues/128

As @mmoollllee mentioned, any workarounds using the Easy Export API endpoint are not equivalent.

It's not possible to pass the download_resources parameter to the the sdk's export_snapshot_download method since, in fact, this method delegates to the second Export API endpoint. (which doesnt accept a download_resources parameter)

Digital2Slave commented 11 months ago

same issue. How to fix it ?

mmoollllee commented 10 months ago

Don't remember how I solved it as I switched to use CVAT but maybe this could help?

import os
import time
from label_studio_sdk import Client
from label_studio_converter import Converter

LABEL_STUDIO_URL = os.getenv('LABEL_STUDIO_URL', default='http://localhost:8080')
API_KEY = "xxxxx"
PROJECT_ID = int("7")
VIEW_ID = False # or:int("18")

# connect to Label Studio
ls = Client(url=LABEL_STUDIO_URL, api_key=API_KEY)
ls.check_connection()

# get existing project
project = ls.get_project(PROJECT_ID)

# get the first tab
views = project.get_views()

for view in views:
    if VIEW_ID and VIEW_ID != view['id']:
        continue

    task_filter_options = {'view': view['id']} if views else {}
    view_name = view["data"]["title"]

    # create new export snapshot
    export_result = project.export_snapshot_create(
        title='Export SDK Snapshot', task_filter_options=task_filter_options
    )
    assert 'id' in export_result
    export_id = export_result['id']

    # wait until snapshot is ready
    while project.export_snapshot_status(export_id).is_in_progress():
        time.sleep(1.0)

    # download snapshot file
    status, file_name = project.export_snapshot_download(export_id)
    assert status == 200
    assert file_name is not None
    os.rename(file_name, view_name + ".json")
    print(f"Status of the export is {status}.\nFile name is {view_name}.json")

Run:

label-studio-converter export -i train.json --config config.xml -o "train" -f YOLO

Also from somewhere in my notes but don't remember what worked:

DEBUG=1 LOG_LEVEL=DEBUG label-studio export 7 YOLO --export-path="<PATH>"

&

label-studio-converter import yolo -i . --image-root-url /data/local-files/?d=Users/<USERNAME>/dev/label-studio/storage/<PROJECTNAME>/ -o train.json

Good luck :)

Digital2Slave commented 10 months ago
  1. export JSON file with the following function.
    
    import time
    from label_studio_sdk import Client

https://labelstud.io/guide/export

https://github.com/HumanSignal/label-studio-sdk/blob/master/examples/export_snapshots.py

def ExportSnapshot(LABEL_STUDIO_URL, API_KEY, PROJECT_ID, SAVE_PATH):

connect to Label Studio

ls = Client(url=LABEL_STUDIO_URL, api_key=API_KEY)
ls.check_connection()

# get existing project
project = ls.get_project(PROJECT_ID)

# get the first tab
views = project.get_views()
task_filter_options = {'view': views[0]['id']} if views else {}

# create new export snapshot
export_result = project.export_snapshot_create(
    title='Export SDK Snapshot', task_filter_options=task_filter_options
)
# assert 'id' in export_result
export_id = export_result['id']

# # wait until snapshot is ready
while project.export_snapshot_status(export_id).is_in_progress():
    time.sleep(1.0)

# download snapshot file
status, file_name = project.export_snapshot_download(export_id, export_type='JSON', path=SAVE_PATH)
assert status == 200
assert file_name is not None
print(f"Status of the export is {status}.\nFile name is {file_name}")

2. set `LS_UPLOAD_DIR` in `.zshrc` or `.bashrc`  and  `source ~/.zshrc`  or `source ~/.bashrc` 

label studio

export LS_UPLOAD_DIR=/home/epbox/AI/data/media/upload


3. save **.xml** file of the export project

![image](https://github.com/HumanSignal/label-studio/assets/7224107/307b2e79-0c8b-47f6-ade2-b4c6ef4bb790)

4. use `label-studio-converter` to convert JSON  to YOLO format

`pip install label-studio-converter`

label-studio-converter export -i .json --config .xml -o "train" -f YOLO


5. check the convert YOLO  **train** folder

(label) ➜ train -h --filelimit=10 --dirsfirst train ├── [ 20K] images [208 entries exceeds filelimit, not opening dir] ├── [ 20K] labels [208 entries exceeds filelimit, not opening dir] ├── [ 124] classes.txt └── [ 840] notes.json

2 directories, 2 files

FallDN commented 5 months ago

export LS_UPLOAD_DIR=/home/epbox/AI/data/media/upload

Could you please explain, what is it? What directory should i write there and where should i do it?

Digital2Slave commented 5 months ago

export LS_UPLOAD_DIR=/home/epbox/AI/data/media/upload

Could you please explain, what is it? What directory should i write there and where should i do it?

LS_UPLOAD_DIR is the root folder which contain all the label project folders.

like :

➜  data pwd
/home/epbox/AI/data
➜  data tree . -h --filelimit=10 --dirsfirst
.
├── [ 20K]  export [234 entries exceeds filelimit, not opening dir]
├── [4.0K]  media
│   ├── [4.0K]  avatars
│   │   ├── [163K]  7ecd0045-xpanda-removebg-preview.png
│   │   ├── [ 15K]  8e98baa9-ai_artificial_intelligence_robot_chip_brain_technology_icon_179495.png
│   │   ├── [163K]  d1c72bac-7ecd0045-xpanda-removebg-preview.png
│   │   └── [ 31K]  d24650b0-筛选分析_图片1.png
│   ├── [4.0K]  export
│   │   ├── [4.0K]  14
│   │   │   └── [ 22M]  project-14-at-2024-06-04-06-15-ce79d12f.json
│   │   ├── [4.0K]  15
│   │   │   └── [ 20M]  project-15-at-2024-06-04-06-23-aaffc724.json
│   │   ├── [4.0K]  16
│   │   │   └── [4.0M]  project-16-at-2024-06-04-06-25-65ef4454.json
│   │   ├── [4.0K]  22
│   │   │   ├── [845K]  project-22-at-2024-01-09-02-43-e4b7f74d.json
│   │   │   ├── [845K]  project-22-at-2024-01-09-02-45-e4b7f74d.json
│   │   │   ├── [925K]  project-22-at-2024-01-09-07-52-f8373f31.json
│   │   │   └── [1.6M]  project-22-at-2024-05-09-06-51-6363cd9d.json
│   │   ├── [4.0K]  23
│   │   │   ├── [335K]  project-23-at-2024-01-04-07-50-46d2b580.json
│   │   │   ├── [335K]  project-23-at-2024-01-04-07-54-46d2b580.json
│   │   │   ├── [335K]  project-23-at-2024-01-04-08-20-46d2b580.json
│   │   │   ├── [335K]  project-23-at-2024-01-05-02-48-46d2b580.json
│   │   │   └── [1.3M]  project-23-at-2024-05-09-06-44-9c59c4fa.json
│   │   └── [4.0K]  24
│   │       ├── [ 46K]  project-24-at-2024-01-25-03-19-8669ea1e.json
│   │       └── [112K]  project-24-at-2024-05-09-02-32-21d14676.json
│   └── [4.0K]  upload [51 entries exceeds filelimit, not opening dir]
├── [4.0K]  test_data
├── [241M]  label_studio.sqlite3
└── [223M]  postgresql

12 directories, 20 files
FallDN commented 5 months ago

Thank you a lot! So, if i'm trying to download YOLO dataset from label studio that is running on server to my local pc, i should LS_UPLOAD_DIR = "folder on server", yes?

Digital2Slave commented 5 months ago

Thank you a lot! So, if i'm trying to download YOLO dataset from label studio that is running on server to my local pc, i should LS_UPLOAD_DIR = "folder on server", yes?

Yes.