Problems in single object tracking data

songtianhui commented 1 year ago

Hi, I met some problems in the data of single object tracking task.

In some videos, the number of frames in metadata and actually cv2 reads is not the same. For example, in video_2935, the metadata num_frames: 467, but
```
>>> cap = cv2.VideoCapture('test_videos/video_2935.mp4')
>>> cap.get(cv2.CAP_PROP_FRAME_COUNT)
524
```
And this mismatch occurs not very few, in the test phrase about tens of videos raise this error, such as video_10363, video_2790. Some of them only differs 1 or 2 frames, but some differs hundreds of frames. How can I compensate it?

In some video metadata, the frame_ids is weird. Still in video_2935, I show the whole annotation info:

{'metadata': {'split': 'test', 'video_id': 'video_2935', 'frame_rate': 29.841567993164062, 'num_frames': 467, 'resolution': [1080, 1920], 'audio_samples': 1592320, 'audio_sample_rate': 48000.0}, 'object_tracking': [{'id': 0, 'bounding_boxes': [[0.7182049751281738, 0.40383899211883545, 0.9262850284576416, 0.8379570245742798]], 'frame_ids': [0], 'timestamps': [0]}, {'id': 1, 'bounding_boxes': [[0.7834759950637817, 0.32430300116539, 1.0, 0.9599249958992004]], 'frame_ids': [60], 'timestamps': [2010618]}, {'id': 2, 'bounding_boxes': [[0.42989298701286316, 0.3875510096549988, 0.6064180135726929, 0.8197540044784546]], 'frame_ids': [0], 'timestamps': [0]}, {'id': 3, 'bounding_boxes': [[0.12319699674844742, 0.39775100350379944, 0.32039299607276917, 0.8364700078964233]], 'frame_ids': [0], 'timestamps': [0]}, {'id': 4, 'bounding_boxes': [[0.5184490084648132, 0.2289160043001175, 0.6666979789733887, 0.5572289824485779]], 'frame_ids': [0], 'timestamps': [0]}, {'id': 5, 'bounding_boxes': [[0.0, 0.11295200139284134, 0.8073229789733887, 0.31927698850631714]], 'frame_ids': [0], 'timestamps': [0]}, {'id': 6, 'bounding_boxes': [[0.04828700050711632, 0.1463879942893982, 0.2830660045146942, 0.38102400302886963]], 'frame_ids': [0], 'timestamps': [0]}, {'id': 7, 'bounding_boxes': [[0.0, 0.09780599921941757, 1.0, 1.0]], 'frame_ids': [0], 'timestamps': [0]}, {'id': 8, 'bounding_boxes': [[0.0, 0.07981900125741959, 1.0, 1.0]], 'frame_ids': [0], 'timestamps': [0]}, {'id': 9, 'bounding_boxes': [[0.16519199311733246, 0.0015059999423101544, 0.9091839790344238, 0.135684996843338]], 'frame_ids': [0], 'timestamps': [0]}, {'id': 10, 'bounding_boxes': [[0.0, 0.0, 0.08232299983501434, 0.09523600339889526]], 'frame_ids': [240], 'timestamps': [8042473]}, {'id': 11, 'bounding_boxes': [[0.0, 0.003938000183552504, 0.03435099869966507, 0.19018900394439697]], 'frame_ids': [150], 'timestamps': [5026546]}, {'id': 12, 'bounding_boxes': [[0.0, 0.005714000202715397, 0.038405001163482666, 0.13807900249958038]], 'frame_ids': [660], 'timestamps': [22116800]}, {'id': 13, 'bounding_boxes': [[0.48952099680900574, 0.6927310228347778, 0.52395099401474, 0.7784489989280701]], 'frame_ids': [60], 'timestamps': [2010618]}]}

The number of frames is 467, however in track of id 12, the frame_ids is 660, which is impossible.

I am very confused about these issues and I think it will affect the running pipeline and results. Can you please check them?

songtianhui commented 1 year ago

I find that not only tens of videos has inconsistent number of frames. Large amount of test set videos raise error. video_811, video_10172, video_11151, video_7808, video_10182, video_11071, video_5387, video_6036...... This makes me worried, I don't know whether it is the data problem or my software problem.

ptchallenge-workshop commented 1 year ago

Hi,

Sorry this is an issue on the data side. Some of the test videos are cut at certain points, there was a problem with the annotations where we had incorrect values for the 'num_frames' in the metadata and didn't cut some of the later tracks.

I've uploaded a new version of the annotations: https://storage.googleapis.com/dm-perception-test/zip_data/sot_test_annotations_challenge2023.zip

This should be correct for all videos and tracks. Thank you for raising this!

songtianhui commented 1 year ago

Everything is well! Thanks very much for correction!

songtianhui commented 1 year ago

Sorry, I still meet some mismatch of videos. It is not much, I collect some of them: video_2458, video_476, video_11146, video_4763, video_5060, video_8811, video_2926, video_5277, video_6365, video_6483, video_2919. Most of them only differs one or two frames. It is possible that the cv2 library cause some error. I can make compensation such as padding the video with the last frame. But I should confirm the standard the number of frames in my report result, align with the metadata or video reads?

ptchallenge-workshop commented 1 year ago

I think this might be an issue with OpenCV, I've looked into it further and reuploaded the file in the same location (https://storage.googleapis.com/dm-perception-test/zip_data/sot_test_annotations_challenge2023.zip) please let me know if the videos line up with the metadata if you can!

Otherwise for slight differences use the metadata value and not the video reads. Thanks.

songtianhui commented 1 year ago

Ok. Thank you.

google-deepmind / perception_test

Problems in single object tracking data #4