Annotation quality issues

Arthur151 / Relative_Human

Relative Human dataset, CVPR 2022

138 stars 1 forks source link

Annotation quality issues #2

Open bkkm78 opened 1 year ago

bkkm78 commented 1 year ago

Thanks for the effort to create this dataset!

When inspecting the annotations, I found some quality issues with the annotations. For some images, the annotations do not seem to be exhaustive. Some clearly visible persons in the foreground are missing in the annotations, such as the one the left in the following image (105520.jpg):

105520 jpg

There are also cases where the full box annotation is only covering part of the person, even if other parts are clearly visible, such the old man riding a horse in this image (100134.jpg):

100134 jpg

There also seem to be overlap between the training set and the eval/test set. For example, 109136.jpg in the validation set appears to be a resized version of 000154.jpg in the training set:

duplicate

Would the authors mind looking into these issues? Thanks!

bkkm78 commented 1 year ago

Here is a list of potential duplicate images I found in the dataset: https://gist.github.com/bkkm78/95fb4faf9ca8303005349a5c396af3c0

Arthur151 commented 1 year ago

@bkkm78 Thanks a lot for reporting this. The reported issues have been added to my schedule and I need some time to fix them. About the issues:

Duplicated images: These images are collected from some existing datasets, such as CrowdPose, using their original name. I will remove the duplicated images and update the evaluation results if necessary.
Missed 2D pose & in-complete full body bounding boxes: Similarly, 2D pose and bounding boxes of some images are inherented from existing datasets. I have tried to maually fix some errors but there might be still some errors out there. I will double check the inhereted annotations again. Welcome to explicitly name the images with errors.

Thanks a lot for being helpful. Let me know if you'd like to make further report or discussion.

Best, Yu

bkkm78 commented 1 year ago

@Arthur151 Thank you for your reply! Here is a list of images that may contain incomplete annotations. https://gist.github.com/bkkm78/e38d089a0cd833bf793c4fb2da7102c1

This list may not be complete, but may be helpful as a starting point. (Being exhaustive is indeed difficult. :))

bkkm78 commented 1 year ago

It may also be helpful if you could release the meta data for each image, such as the source dataset from which the image is collected.

Arthur151 commented 1 year ago

@bkkm78 Thanks for your efforts! The image list would be very helpful!

The image name of different datasets are quite easy to tell. For example, the image name of CrowdPose would be 6 number starting with 1, like 1xxxxx.jpg. The name of images we collect from InterNet would be 7 number. The image name of OCHuman would be 6 number starting with 0. Something like that.