I'm trying to upload data from S3 to a local CVAT instance running in Docker
I'm using the CVAT CLI
I've created and verified manifest.jsonl file
Question
In both cases, I specify the S3 prefix path to where the images are stored, however, the command only works if the manifest is stored in the same S3 location as the image data. If the manifest is elsewhere in S3, the upload fails. Below I've included examples of successful and unsuccessful uploads to illustrate the problem with a concrete example.
Is this behaviour expected, or, is there a way to upload to CVAT from S3 with the manifest file stored separately from the images? I really appreciate any help you can provide.
manifest.jsonl in the same S3 location as images
When I run the command with the manifest.jsonl file stored in the same location in S3 as the images, the upload is successful:
manifest.jsonl in a different S3 location from images
However, when I run the command with the manifest.jsonl file stored in a different location in S3 from the images, the upload results in error:
When I run the command with the manifest.jsonl file stored in the same location in S3 as the images, the upload is successful:
# Command
cvat-cli --auth <cvat_username>:<cvat_password> \
--server-host http://localhost \
--server-port 8080 \
--organization <org_name> \
create "<task_name>" --use_cache \
--project_id <proj_id> \
--annotation_path "/path/to/local/annotations.json" \
--annotation_format "COCO 1.0" \
--cloud_storage_id <cloud_id> \
--filename_pattern "path/to/images/on/s3/*.png" \
share a/different/location/on/s3/manifest.jsonl
# Output (error)
[2024-06-25 15:44:54] INFO: Created task ID: 225 NAME: <task_name>
[2024-06-25 15:44:54] INFO: Awaiting for task 225 creation...
[2024-06-25 15:44:56] INFO: Task 225 creation status: Failed (message=Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/rq/worker.py", line 1431, in perform_job
rv = job.perform()
File "/opt/venv/lib/python3.10/site-packages/rq/job.py", line 1280, in perform
self._result = self._execute()
File "/opt/venv/lib/python3.10/site-packages/rq/job.py", line 1317, in _execute
result = self.func(*self.args, **self.kwargs)
File "/usr/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/django/cvat/apps/engine/task.py", line 646, in _create_thread
media, task_mode = _validate_data(media, manifest_files)
File "/home/django/cvat/apps/engine/task.py", line 260, in _validate_data
raise ValueError('No media data found')
ValueError: No media data found)
[2024-06-25 15:44:56] CRITICAL: Status Code: 200
Reason: OK
HTTP response headers: HTTPHeaderDict({'Allow': 'GET, HEAD, OPTIONS', 'Content-Length': '846', 'Content-Type': 'application/vnd.cvat+json', 'Cross-Origin-Opener-Policy': 'same-origin', 'Date': 'Tue, 25 Jun 2024 05:44:56 GMT', 'Referrer-Policy': 'same-origin, strict-origin-when-cross-origin', 'Server': 'nginx', 'Vary': 'Accept, Accept-Encoding, Origin, Cookie', 'X-Content-Type-Options': 'nosniff, nosniff', 'X-Frame-Options': 'DENY, deny', 'X-Request-Id': 'c8ebf596-82bd-4bee-8f45-3583a247db8e'})
HTTP response body: b'{"state":"Failed","message":"Traceback (most recent call last):\\n File \\"/opt/venv/lib/python3.10/site-packages/rq/worker.py\\", line 1431, in perform_job\\n rv = job.perform()\\n File \\"/opt/venv/lib/python3.10/site-packages/rq/job.py\\", line 1280, in perform\\n self._result = self._execute()\\n File \\"/opt/venv/lib/python3.10/site-packages/rq/job.py\\", line 1317, in _execute\\n result = self.func(*self.args, **self.kwargs)\\n File \\"/usr/lib/python3.10/contextlib.py\\", line 79, in inner\\n return func(*args, **kwds)\\n File \\"/home/django/cvat/apps/engine/task.py\\", line 646, in _create_thread\\n media, task_mode = _validate_data(media, manifest_files)\\n File \\"/home/django/cvat/apps/engine/task.py\\", line 260, in _validate_data\\n raise ValueError(\'No media data found\')\\nValueError: No media data found","progress":0.0}'
A summary of my use-case:
manifest.jsonl
fileQuestion
In both cases, I specify the S3 prefix path to where the images are stored, however, the command only works if the manifest is stored in the same S3 location as the image data. If the manifest is elsewhere in S3, the upload fails. Below I've included examples of successful and unsuccessful uploads to illustrate the problem with a concrete example.
Is this behaviour expected, or, is there a way to upload to CVAT from S3 with the manifest file stored separately from the images? I really appreciate any help you can provide.
manifest.jsonl
in the same S3 location as imagesWhen I run the command with the
manifest.jsonl
file stored in the same location in S3 as the images, the upload is successful:manifest.jsonl
in a different S3 location from imagesHowever, when I run the command with the
manifest.jsonl
file stored in a different location in S3 from the images, the upload results in error: When I run the command with themanifest.jsonl
file stored in the same location in S3 as the images, the upload is successful: