cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.62k stars 3.01k forks source link

When uploaded file has "_" in filename, the <folder> tag of the PASCAL VOC format export is broken. #2444

Open kakibara opened 4 years ago

kakibara commented 4 years ago

When uploaded file has "_" in filename, the tag of the PASCAL VOC format export is broken.

Your Environment

reproduction procedure

  1. create task with uploading image file "4-b-2_blue_1.png". (you can make it by yourself).
  2. dump annotation from menu. and you can get as following.
<annotation>
  <folder>4-b-2</folder>
  <filename>4-b-2_blue_1.png</filename>
  <source>
    <database>Unknown</database>
    <annotation>Unknown</annotation>
    <image>Unknown</image>
  </source>
  <size>
    <width>149</width>
    <height>187</height>
    <depth></depth>
  </size>
  <segmented>0</segmented>
</annotation>

Next steps

You may join our Gitter channel for community support.

zhiltsov-max commented 4 years ago

Could you highlight the problem you see? The file looks valid.

kakibara commented 4 years ago

Thank you for picking up the issue! The plobrem is <folder> tag. It should be empty because xml and image files are saved in the same directory in exported dataset. But it looks like cvat treats _ as a directory separator.


<annotation>
  <folder>4-b-2</folder>     <--- here should be empty.
  <filename>4-b-2_blue_1.png</filename>
  <source>
    <database>Unknown</database>
    <annotation>Unknown</annotation>
    <image>Unknown</image>
  </source>
  <size>
    <width>149</width>
    <height>187</height>
    <depth></depth>
  </size>
  <segmented>0</segmented>
</annotation>
zhiltsov-max commented 4 years ago

It is treated as a directory separator because in PASCAL VOC it serves as the separator. You can find 2007 for 2007_xxxxxx.jpg in the original dataset. Do you think it is worth changing in some way?

kakibara commented 3 years ago

Sorry I didn't know about that. So it means that I should avoid using _ in the file name when I use PASCAL VOC format, isn't it?

In my case, there is not 4-b-2 directory and exported data also doesn't have this as the screenshot. This is my exported files with the above xml file. The problem is that the output files are not consistent. Respect to PASCAL VOC, creating 4-b-2 sub-directory in JPEGImages could solve it. In terms of my operations, I can solve this problem by replacing the '_' in the file name with other characters or edit the folder tag.

Screen Shot 2020-11-19 at 10 34 45
zhiltsov-max commented 3 years ago

May be we could add special logic for exporting from CVAT to avoid using underscores for this, but in general - what we can see in PASCAL VOC is images with year prefix like JPEGImages/2007_12345.jpg (as well as annotations).

nmanovic commented 2 years ago

@zhiltsov-max , @kirill-sizov , I believe we can avoid the limitation. Let's try to find a solutuion.

zhiltsov-max commented 2 years ago

Maybe, after we started to preserve full file names in ImageNet format (opposed to previous truncating of the class name prefix), we should proliferate this to other formats too.