cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.69k stars 3.02k forks source link

CVAT SDK - better documentation needed #7435

Open maecky opened 9 months ago

maecky commented 9 months ago

Actions before raising this issue

Steps to Reproduce

I do have Azure Blob Storage connected to CVAT. All image files with corresponding manifest are located in the storage. I could like to automate the task creation in a project with the help of the cvat_sdk. From the documentation, it is impossible for me to get that working.

First I tried it via the swagger_api documentation (and reverse engineered the browser calls). I was only able to create empty tasks there:

session = requests.session()
login_resp = session.post(LOGIN_URL, json=login_data)
cookies = login_resp.cookies
csrftoken = session.cookies['csrftoken']
header = {
    "X-CSRFToken": csrftoken
}

payload = {
        "chunk_size": None,
        "size": 0,
        "image_quality": 70,
        "start_frame": 0,
        "stop_frame": 40,
        "frame_filter": "",
        "compressed_chunk_type": "imageset",
        "original_chunk_type": "imageset",
        "client_files": [],
        "server_files": [
            "manifest.jsonl"
        ],
        "remote_files": [],
        "use_zip_chunks": False,
        "server_files_exclude": [],
        "use_cache": False,
        "copy_data": False,
        "storage_method": "file_system",
        "storage": "local",
        "sorting_method": "lexicographical",
        "filename_pattern": None,
}

header_upload = {
    "Upload-Start": "true",
}
r = session.post(f"{BASE_URL}/tasks/{id}/data?org=testorg", json=json.dumps(payload), cookies=cookies, headers=header_upload)

This request always ends in Response[500]. I tried with various combinations of Upload-Start and Upload-Finish headers but I am not able to get it working....

Then I switched to cvat-sdk high level API:

with make_client(host=BASE_URL, credentials=(USER, PASSWORD)) as client:
    client.organization_slug = "testorg"

    task_spec = {
        "name": "api_test_10",
        "project_id": 20,
        "labels": [],
        "subset": "string",
        "target_storage": {
            "location": "cloud_storage",
            "cloud_storage_id": 21
        },
        "source_storage": {
            "location": "cloud_storage",
            "cloud_storage_id": 21
        },
        "server_files": [
            "manifest.jsonl"
        ],
        "sorting_method": "lexicographical",
    }

    task = client.tasks.create_from_data(
        spec=task_spec,
        # resource_type=ResourceType.REMOTE,
        resources=['various_3.jsonl'],
    )

The result I get is:

HTTP response body: b'{"state":"Failed","message":"A manifest file can only be used with the \'use cache\' option or when \'sorting_method\' is \'predefined\'","progress":0.0}'

The documentation of the SDK is pretty hard to follow, it would be nice to have a few more examples in there. Any help would be apprecitad. Thanks!

Expected Behavior

No response

Possible Solution

No response

Context

No response

Environment

No response

thmegy commented 4 months ago

I would also be very interested by the answer to this issue. From the documentation, it is not possible to understand how to load data from the cloud storage...

code4days commented 1 month ago

I'm facing a similar issue, the documentation is extremely lacking, especially when it comes to cloud storage.