Open deltheil opened 2 weeks ago
Hello,
python-magic
is significantly slower. We used it in the past, but it was decided to work with extensions.
Additionally, it will not work with cloud storages as CVAT needs to download file content -> much much slower.
python-magic
is significantly slower. We used it in the past, but it was decided to work with extensions.
Right, that's a drawback.
Additionally, it will not work with cloud storages as CVAT needs to download file content -> much much slower.
True (perhaps the Content-Type (HTTP header) and/or HEAD requests could be leveraged here - not sure how it's being handled right now).
For context: when using the FiftyOne built-in CVAT integration, this even turns into a bug as _get_job_ids
polls forever (and no job is ever returned).
Actions before raising this issue
Is your feature request related to a problem? Please describe.
Context
I am uploading image files via https://app.cvat.ai/api/docs/#tag/tasks/operation/tasks_create_data (using the
client_files
parameters).In my case, my image files are stored on disk in a content-addressable manner mimicking how git store and name files. E.g. typically, a JPEG file could be stored as
/var/misc/images/1f/ec4f5cee029f96c1e9eddd09821a51c0a9f80a
.Problem
The problem is related to the CVAT engine MIME type detection which is based on file extensions:
E.g.
is_image
builds upon https://docs.python.org/3/library/mimetypes.html#mimetypes.guess_type:tl;dr
In my case, all the uploaded image files get ignored.
Describe the solution you'd like
I think it would be great if MIME type detection could be expanded to support magic detection (file headers), e.g. using https://github.com/ahupp/python-magic or anything equivalent. In other words, do not get limited to file extension based detection (
.jpg
, etc).NB.: I am talking about images, but same could be done for other media types of course.
Describe alternatives you've considered
I am forced to rename (add an extension) at upload time (work around).
Additional context
No response