Closed ganindu7 closed 11 months ago
Hi! CVAT uses a special protocol for data uploading, which includes several requests. If you don't have special requirements to how the requests are sent, please try the high-level API instead. Here you can find an example from tests.
from cvat_sdk import make_client, models
with make_client(...) as client:
project = client.projects.create_from_dataset(
spec=models.ProjectWriteRequest(name="project with data"),
dataset_path="path/to/archive.zip",
dataset_format="COCO 1.0"
)
If you need to control requests, please check this example.
Might be an issue with a Json file format, you could try using a different file type that's supported by cvat: https://opencv.github.io/cvat/docs/manual/advanced/formats/
Hi! CVAT uses a special protocol for data uploading, which includes several requests. If you don't have special requirements to how the requests are sent, please try the high-level API instead. Here you can find an example from tests.
from cvat_sdk import make_client, models with make_client(...) as client: project = client.projects.create_from_dataset( spec=models.ProjectWriteRequest(name="project with data"), dataset_path="path/to/archive.zip", dataset_format="COCO 1.0" )
If you need to control requests, please check this example.
Thanks for getting back to me,
I will try the high level API as you suggested,
I have two questions.
1) Can I use KITTI 1.0
as the dataset_format
?
├── training_1
│ ├── image_2 (n images)
│ └── label_2 (n labels)
├── training_1.zip (a zip archive of the training_1 directory)
├── test.ipynb (notebook running the code I have listed below)
2) As I'm using the KITTI 1.0
format (and as I've shown above) dataset_path
can I use the local path to the .zip
archive?
Cheers, Ganindu.
Can I use "KITTI 1.0" as the dataset_format?
Sure, please check the uploaded file uses the file layout described here.
As I'm using the KITTI 1.0 format (and as I've shown above) dataset_path can I use the local path to the .zip archive?
Yes, this is the default option in high-level SDK.
Here is my adopted code!
Add a progress bar (install ipywidgets if using a notebook)
from tqdm.notebook import tqdm as tqdm_notebook
from cvat_sdk.core.helpers import TqdmProgressReporter
def make_pbar(file, **kwargs):
return TqdmProgressReporter(tqdm(file=file, mininterval=0, **kwargs))
def make_notebook_pbar(file, **kwargs):
upload dataset code
kitti_dataset_path = "training_3.zip"
pbar_out = io.StringIO()
pbar = make_notebook_pbar(file=pbar_out)
with make_client(
host="http://cvat.lol", # cvat server location
port='8080',
credentials=("username", "password") #, f"Token {token}"), # is there a way to do token based authentication here?
) as client:
# projects = client.projects
new_project = client.projects.create_from_dataset(
spec = models.ProjectWriteRequest(name="fancy_project_name"),
dataset_path = kitti_dataset_path,
dataset_format = 'KITTI 1.0',
pbar = pbar,
)
This code works (almost) . A project gets created in the server and the progress bar goes upto 100% and the dataset gets uploaded. However an error gets thrown in the end.
# traceback from my notebook cell
15 # projects = client.projects
---> 16 new_project = client.projects.create_from_dataset(
17 spec = models.ProjectWriteRequest(name="fancy_project_name"),
18 dataset_path = kitti_dataset_path,
19 dataset_format = 'KITTI 1.0',
20 pbar = pbar,
21 )
# traceback from File {my python}/site-packages/cvat_sdk/core/proxies/projects.py:174), in ProjectsRepo.create_from_dataset(self, spec, dataset_path, dataset_format, status_check_period, pbar)
171 self._client.logger.info("Created project ID: %s NAME: %s", project.id, project.name)
173 if dataset_path:
...
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
am I doing something wrong?
The code seems correct, could you please include the full traceback?
Thanks a lot for getting back to me!!
This is when running a standalone python script (please ignore the redundant imports ) I also noticed that the progress-bar is not visible when running the standalone script
import requests
import json
from tabulate import tabulate
from typing import Tuple
import io
import textwrap
from pathlib import Path
from cvat_sdk import make_client, Client, models
from cvat_sdk.api_client import Configuration
from cvat_sdk.api_client import exceptions
from cvat_sdk.core.proxies.projects import Project
from cvat_sdk.core.helpers import TqdmProgressReporter
from util import make_pbar
from tqdm import tqdm
def make_pbar(file, **kwargs):
return TqdmProgressReporter(tqdm(file=file, mininterval=0, **kwargs))
kitti_dataset_path = "training_6.zip"
pbar_out = io.StringIO()
pbar = make_pbar(file=pbar_out)
with make_client(
host="http://cvat.app.gnet",
port='8080',
credentials=("username", "password") #, f"Token {token}"),
) as client:
# projects = client.projects
new_project = client.projects.create_from_dataset(
spec = models.ProjectWriteRequest(name="nozzlenet-data-7"),
dataset_path = kitti_dataset_path,
dataset_format = 'KITTI 1.0',
pbar = pbar,
)
Traceback (most recent call last):
File "/mnt/qnap_ganindu/ubuntu_backup/master_dataset/master_dataset/data_manager.py", line 35, in <module>
new_project = client.projects.create_from_dataset(
File "/home/g/.pyenv/versions/TAO310/lib/python3.10/site-packages/cvat_sdk/core/proxies/projects.py", line 174, in create_from_dataset
project.import_dataset(
File "/home/g/.pyenv/versions/TAO310/lib/python3.10/site-packages/cvat_sdk/core/proxies/projects.py", line 57, in import_dataset
DatasetUploader(self._client).upload_file_and_wait(
File "/home/g/.pyenv/versions/TAO310/lib/python3.10/site-packages/cvat_sdk/core/uploading.py", line 317, in upload_file_and_wait
rq_id = json.loads(response.data).get("rq_id")
File "/home/g/.pyenv/versions/3.10.11/lib/python3.10/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/home/g/.pyenv/versions/3.10.11/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/g/.pyenv/versions/3.10.11/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Please make sure:
On mismatching versions, use the SDK matching the server version.
is there a way to do token based authentication here?
Here's how to login with a token. ApiClient
instance for a Client
is available at client.api_client
.
I also noticed that the progress-bar is not visible when running the standalone script
The output is redirected in pbar_out = io.StringIO()
in the code sample.
Hi Thanks again for the reply!
I used the modified code below to import the dataset in chunks (is there a limitation for max size?)
# clear output (e.g. progress bar from previous run)
clear_output()
kitti_dataset_path = "training_2.zip"
pbar_out = io.StringIO()
pbar = make_notebook_pbar(file=pbar_out)
table_headers = ["Project ID", "Name"]
with make_client(**params) as client: # config will be included in the params dictionary
try:
projects = client.projects.list()
# print(tabulate([[project.id, project.name] for project in projects], headers=table_headers, tablefmt="grid"))
# get the specific project
my_project = client.projects.retrieve(3)
# check if the kitti dataset is in the path specified by 'kitti_dataset_path' and it has label_2 image_2 subfolders.
if not Path(kitti_dataset_path).exists():
raise ValueError(f"Dataset path {kitti_dataset_path} does not exist")
if not check_subfolders_in_zip(kitti_dataset_path, {"label_2", "image_2"}):
raise ValueError(f"Dataset path {kitti_dataset_path} does not contain label_2 and image_2 subfolders")
# upload dataset to the project
smart_sweeper_project.import_dataset(
format_name="KITTI 1.0",
filename=kitti_dataset_path,
pbar=pbar,
status_check_period=5,
)
my_project.fetch()
pbar.finish()
except exceptions.ApiException as e:
print("Exception when calling ProjectsApi.create_dataset: %s\n" % e)
I get what I want now (Dataset get uploaded It is just the error text that would be great to get rid of)
Here is the error I get in the notebook
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
Cell In[6], line 29
26 raise ValueError(f"Dataset path {kitti_dataset_path} does not contain label_2 and image_2 subfolders")
28 # upload dataset to the project
---> 29 my_project.import_dataset(
30 format_name="KITTI 1.0",
31 filename=kitti_dataset_path,
32 pbar=pbar,
33 status_check_period=5,
34 )
35 my_project.fetch()
42 pass
File ~/.pyenv/versions/3.10.11/envs/TAO310/lib/python3.10/site-packages/cvat_sdk/core/proxies/projects.py:57, in Project.import_dataset(self, format_name, filename, status_check_period, pbar)
51 """
52 Import dataset for a project in the specified format (e.g. 'YOLO ZIP 1.0').
53 """
55 filename = Path(filename)
---> 57 DatasetUploader(self._client).upload_file_and_wait(
58 self.api.create_dataset_endpoint,
59 self.api.retrieve_dataset_endpoint,
60 filename,
61 format_name,
62 url_params={"id": self.id},
63 pbar=pbar,
64 status_check_period=status_check_period,
65 )
67 self._client.logger.info(f"Annotation file '{filename}' for project #{self.id} uploaded")
File ~/.pyenv/versions/3.10.11/envs/TAO310/lib/python3.10/site-packages/cvat_sdk/core/uploading.py:317, in DatasetUploader.upload_file_and_wait(self, upload_endpoint, retrieve_endpoint, filename, format_name, url_params, pbar, status_check_period)
313 params = {"format": format_name, "filename": filename.name}
314 response = self.upload_file(
315 url, filename, pbar=pbar, query_params=params, meta={"filename": params["filename"]}
316 )
--> 317 rq_id = json.loads(response.data).get("rq_id")
318 assert rq_id, "The rq_id was not found in the response"
320 url = self._client.api_map.make_endpoint_url(retrieve_endpoint.path, kwsub=url_params)
File ~/.pyenv/versions/3.10.11/lib/python3.10/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
341 s = s.decode(detect_encoding(s), 'surrogatepass')
343 if (cls is None and object_hook is None and
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
--> 346 return _default_decoder.decode(s)
347 if cls is None:
348 cls = JSONDecoder
File ~/.pyenv/versions/3.10.11/lib/python3.10/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
332 def decode(self, s, _w=WHITESPACE.match):
333 """Return the Python representation of ``s`` (a ``str`` instance
334 containing a JSON document).
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
339 if end != len(s):
File ~/.pyenv/versions/3.10.11/lib/python3.10/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
looks like response does not have "rq_id" (is None) and uploading.py:317 is causing the "JSONDecodeError"
response = self.upload_file(
315 url, filename, pbar=pbar, query_params=params, meta={"filename": params["filename"]}
316 )
--> 317 rq_id = json.loads(response.data).get("rq_id")
I put the print statement to check if "response_data" carries any data (in cvat_sdk/core/uploading.py)
317 print(f"DEBUG: response data = {response.data}")
--> 318 rq_id = json.loads(response.data).get("rq_id")
I got
DEBUG: response data = b''
Am I packaging the request incorrectly?
fixing this will help me upload a set of dataset segments as jobs into a single project (we do this to actually audit a dataset that was previously used in training)
So thanks a lot for the continued support!
If there is no rq_id
in the reply and the status is 202, then the server is probably using some older version. The rq_id response was added in https://github.com/opencv/cvat/pull/5909.
Cheers,
I updated the repo; now at (2896bec3d4f19b392d24e8119ff085793a550b34
) confirmed with git rev-parse HEAD
Cleaned up the images docker images -q | xargs docker rmi
Set up envs export CVAT_HOST=my.local.cvat.hostpath
and docker-compose overrides to point to storage
Rebuilt and launched with docker compose up -d --build
set up some extra debugging
312 url = self._client.api_map.make_endpoint_url(upload_endpoint.path, kwsub=url_params)
313 params = {"format": format_name, "filename": filename.name}
314 response = self.upload_file(
315 url, filename, pbar=pbar, query_params=params, meta={"filename": params["filename"]}
316 )
317 print(f"DEBUG: response status_code = {response.status}")
318 print(f"DEBUG: response reason = {response.reason}")
319 print(f"DEBUG: response headers = {response.headers}")
320 print(f"DEBUG: response data = {response.data}")
Confirmed the issue is fixed wit the output!
DEBUG: response status_code = 202
DEBUG: response reason = Accepted
DEBUG: response headers = HTTPHeaderDict({'Allow': 'GET, POST, HEAD, OPTIONS', 'Content-Length': '47', 'Content-Type': 'application/vnd.cvat+json', 'Cross-Origin-Opener-Policy': 'same-origin', 'Date': 'Mon, 24 Jul 2023 12:06:11 GMT', 'Referrer-Policy': 'same-origin', 'Server': 'nginx/1.18.0 (Ubuntu)', 'Vary': 'Accept, Origin', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'DENY', 'X-Request-Id': '7bf391c6-6474-4599-80fa-5f4a05264607'})
DEBUG: response data = b'{"rq_id":"import:project-3-dataset-by-ganindu"}'
Thanks a lot Maxim!!
Best, Ganindu.
make_client can use token to auth?
@liudaolunboluo, answered in #7439
I also encountered a similar question,
So I want to know how to import a project dataset through rest api? Has anyone ever achieved this completely?
Or does it have to be implemented through SDK as mentioned above?
I also encountered a similar question,
So I want to know how to import a project dataset through rest api? Has anyone ever achieved this completely?
Or does it have to be implemented through SDK as mentioned above? 建议最好用sdk实现,用rest api的话非常复杂,因为这用了一个外部的上传组件,整个过程是异步的,你需要考虑去等待结果
@YaoJusheng, the uploading is available, it uses the TUS file uploading protocol. You can implement it manually or use CVAT SDK for uploading, as shown in https://github.com/cvat-ai/cvat/issues/6525#issuecomment-1643731268 .
I also encountered a similar question, So I want to know how to import a project dataset through rest api? Has anyone ever achieved this completely? Or does it have to be implemented through SDK as mentioned above? 建议最好用sdk实现,用rest api的话非常复杂,因为这用了一个外部的上传组件,整个过程是异步的,你需要考虑去等待结果
嗯,看issue是说使用了特殊协议,只不过我们在使用时自己实现了一套基于Rest API的调度管理逻辑来与cvat交互,仅仅是一个接口的话更换sdk有点不合适,我再看看吧
@YaoJusheng, the uploading is available, it uses the TUS file uploading protocol. You can implement it manually or use CVAT SDK for uploading, as shown in #6525 (comment) .
Ok, thanks for the reply, I will refer to it.
I also encountered a similar question, So I want to know how to import a project dataset through rest api? Has anyone ever achieved this completely? Or does it have to be implemented through SDK as mentioned above? 建议最好用sdk实现,用rest api的话非常复杂,因为这用了一个外部的上传组件,整个过程是异步的,你需要考虑去等待结果
嗯,看issue是说使用了特殊协议,只不过我们在使用时自己实现了一套基于Rest API的调度管理逻辑来与cvat交互,仅仅是一个接口的话更换sdk有点不合适,我再看看吧
你的场景和我们一样,我们也是把cvat通过 rest api接入到了自己的系统里,然后需要在自己系统里做上传导入,我的解决方案是通过python做了一个adapter或者说转发,因为我们在上传之前也涉及到用python做自动标注,可以参考一下。因为在官方api文档中没有特别标注有导入上传到,我F12也研究了半天,用的tus这个组件,并且cvat在处理上传数据集的时候异常处理的非常差,xml格式的哪怕是多一个空格换行符都会上传失败,并且不会打印真实的错误信息
I also encountered a similar question, So I want to know how to import a project dataset through rest api? Has anyone ever achieved this completely? Or does it have to be implemented through SDK as mentioned above? 建议最好用sdk实现,用rest api的话非常复杂,因为这用了一个外部的上传组件,整个过程是异步的,你需要考虑去等待结果
嗯,看issue是说使用了特殊协议,只不过我们在使用时自己实现了一套基于Rest API的调度管理逻辑来与cvat交互,仅仅是一个接口的话更换sdk有点不合适,我再看看吧
你的场景和我们一样,我们也是把cvat通过 rest api接入到了自己的系统里,然后需要在自己系统里做上传导入,我的解决方案是通过python做了一个adapter或者说转发,因为我们在上传之前也涉及到用python做自动标注,可以参考一下。因为在官方api文档中没有特别标注有导入上传到,我F12也研究了半天,用的tus这个组件,并且cvat在处理上传数据集的时候异常处理的非常差,xml格式的哪怕是多一个空格换行符都会上传失败,并且不会打印真实的错误信息
好的,非常感谢,我刚看了一下TUS,结合cvat请求似乎流程很简单,创建资源 -> 检查上传状态 -> 分块上传 -> 断点续传处理
My actions before raising this issue
I am trying to upload a dataset using the python API
My python version
3.10.11
CVAT version: 2.5My directory structure looks like this
First I read the documentation at https://opencv.github.io/cvat/docs/api_sdk/sdk/reference/apis/projects-api/ to write the following code
I got back:
Then I also tried with rest API calls
created a session object
then I tried login
login worked (AFIK)
Then I tried to upload the dataset again
I got another error
Expected Behaviour
Dataset being uploaded
Am I doing anything wrong here? I also tried creating a dataset by using https://opencv.github.io/cvat/docs/api_sdk/sdk/reference/apis/projects-api/#example but it failed too (maybe the example is outdated)
Cheers, Ganindu.