allenai / sherlock

Code, data, models for the Sherlock corpus
Apache License 2.0
55 stars 7 forks source link

sherlock dataset annotation BUG #6

Open RaisnnowLawrence opened 5 months ago

RaisnnowLawrence commented 5 months ago

hello I found that the annotated size of some images was inconsistent with the actual size, which caused serious problems when using BBOX to process images. Can this problem be solved? like this:

94e50aa85ac6bf99fe46206963c759be sherlock/Datasets/SherlockPack/vcr1images/movieclips_Eye_See_You/PzZHiGYjhss@23.jpg (720, 1280, 3) 480 854 3

6f87373795ef51c1efdeb193734c14a2 sherlock/Datasets/SherlockPack/vcr1images/movieclips_Eye_See_You/PzZHiGYjhss@23.jpg (720, 1280, 3) 480 854 3

e4ac2e273254069f8b7ff952dd394bef sherlock/Datasets/SherlockPack/vcr1images/movieclips_Eye_See_You/PzZHiGYjhss@23.jpg (720, 1280, 3) 480 854 3

thank you

jmhessel commented 5 months ago

Hi @RaisnnowLawrence --- thanks for raising this issue!

I don't have full context at the moment because it's been a while since we collected these annotations. Can you provide a bit more detail about where these image sizes are deriving from? Is it the case that: we provided one image size in our annotations, whereas, the raw images are a different size? Because the aspect ratios are the same, I think that a scaling operation should suffice to align them because the aspect ratios are consistent.

RaisnnowLawrence commented 4 months ago

HI @jmhessel -- thank you for your reply I looked for a typical annotation: json     {         "inputs": {             "bboxes": [                 {                     "height": 107,                     "left": 533,                     "top": 176,                     "width": 124                 },                 {                     "height": 134,                     "left": 930,                     "top": 182,                     "width": 70                 }             ],             "clue": "police gear hanging up.",             "confidence": 2.0,             "image": {                 "height": 480,                 "url": "http://s3-us-west-2.amazonaws.com/ai2-rowanz/vcr1images/movieclips_Eye_See_You/PzZHiGYjhss@23.jpg",                 "width": 854             },             "obs_idx": 2         },         "instance_id": "e4ac2e273254069f8b7ff952dd394bef",         "targets": {             "inference": "this room is in a police headquaters."         }     } json it's in file Sherlock training corpus v1.1 bbox annotation 2 <"left": 930, "width": 70> higher than marked value <"width": 854> And the actual image size is actually (720, 1280, 3) I don't know how to solve annotations like this, and there are quite a lot of them. Looking forward to your reply

jmhessel commented 4 months ago

Hi @RaisnnowLawrence ,

Thanks for raising this issue! Getting back quickly ---

  1. The image height/width variables in the annotations files are not used by the training or evaluation code; thank you for pointing out this bug, and we will look into releasing a version with those unused variables removed. They seem to be the result of applying a re-size scaling operation, but, they don't reflect the annotations themselves. Instead, loading the image using, e.g., PIL and using the size of the image according to that is appropriate, like this: https://github.com/allenai/sherlock/blob/6802669760582d533dbb815eef1adbd83065ba7b/training_code/train_retrieval_clip.py#L126
  2. The bounding box locations can be seen at urls like these, e.g., for the example you posted: https://storage.googleapis.com/ai2-jack-public/sherlock_mturk/images_with_bboxes/e4ac2e273254069f8b7ff952dd394bef.jpg . This specific example seems to be off, and might have been the result of an annotator moving quickly through our interface. To check to see if there is a systematic issue with the corpus, myself and my co-first-author @jenahwang went through 200 random validation examples, spanning 100 VG images and 100 VCR images. We rated them as "good", "meh", and "bad". 97% = 194/200 were "good", 2.5% = 5/200 were "meh", and 1/200 = .5% were bad (had we had the sample you provided in our set, it would have been rated "bad".

To summarize:

  1. We will look into it more, but, the way we use sizing is to simply use the size of the image, rather than the size of the image reported in the dataset, which appears to be a scaled version. We will look into releasing a version of the corpus with these misleading fields removed, thank you for raising this !
  2. When using the newly recommended loading method, the rate of bad examples is pretty low (in our random validation sample, 1/200).
RaisnnowLawrence commented 4 months ago

Yes, most of the time, but the size information of the wrong image has no impact on the laboratory. When I use the data set, the bbox will be out of range, which will make the annotation unusable. Looking forward to your solution. thank you for your reply.

jmhessel commented 4 months ago

Hey !

To clarify my points from above:

  1. Use the size of the image file itself, not the image size listed in the dataset release. This solves the box out of range issue for the example you mentioned, and is the recommended way of getting image sizes. We plan to update the jsons with that unused field removed.
  2. We did an additional quality audit based on your example of the dataset and found that there aren't any super frequent systematic issues with the bbox placements.

Given those two responses, I'm not sure what else I can help you with. But do let me know and I'm happy to take a look!

RaisnnowLawrence commented 4 months ago

I'm trying to crop annotated areas of an image, so these incorrect annotations are bothering me a bit. Another problem arises if you use the size of the image itself, as shown below.

json     {         "inputs": {             "bboxes": [                 {                     "height": 697,                     "left": 17,                     "top": 215,                     "width": 1903                 }             ],             "clue": "train going by near building",             "confidence": 3.0,             "image": {                 "height": 1080,                 "url": "http://s3-us-west-2.amazonaws.com/ai2-rowanz/vcr1images/movieclips_Hostel/NVB5kj4k6O4@1.jpg",                 "width": 1920             },             "obs_idx": 0         },         "instance_id": "fba83958dca20b1f6a9f7b86a4b4a34d",         "targets": {             "inference": "this is a train station"         }     } json

The size and annotation of the image are based on the 10801920 standard, but the actual image size is 7201280, and the annotation width is 1903, so the bbox must be implemented according to the annotation width. This can create a problem where some data doesn't know whether to use the actual size or the annotated size.

I counted all the data, and there were 2654 pieces of data that could not be matched in size. As you said, the problematic data is only a small part, and I still hope this issue can be properly resolved. I have included these statistics as an attached file, hoping it will be of some help to you in your subsequent work. Thank you for your reply and wish you a happy life. height_and_width_wrong.txt

jmhessel commented 4 months ago

Thank you! This makes sense --- there seem to be ~1% of cases where the bounding box is potentially out of bounds, even when ignoring the height/width in the dataset files we distribute and using the original images. Let me run these cases by @jenahwang and get back to you