Error when reading videos with given Dataset

jiachenlei commented 2 years ago

Hi. Thanks for your great contribution to the community. An attribute error below will be raised when reading videos with the given Dataset - StateChangeDetectionAndKeyframeLocalisation.

AttributeError: 'av.codec.codec.Codec' object has no attribute 'seek

When frames of clips are not prepared, the Dataset will call the function _get_frames() which is implemented in utils/trim.py to read clip frames from videos. However, code at line 165 in utils/trim.py will raise an error. According to the documentation of pyav, av.container.streams.StreamContainer.video does not have attribute seek() but av.container.InputContainer do.

https://github.com/EGO4D/hands-and-objects/blob/c207485ebb501b5e1088f1ab457b258e61985d54/state-change-localization-classification/i3d-resnet50/utils/trim.py#L137

https://github.com/EGO4D/hands-and-objects/blob/c207485ebb501b5e1088f1ab457b258e61985d54/state-change-localization-classification/i3d-resnet50/utils/trim.py#L165

(lines around 165 try to seek to the point we need in the video and audio separately)

And I simply change code at line 165 to container.seek(...., stream=container.streams.video[0])

But this fails to set the frame to the expected position. I comment lines from 164 to 171 and it only works properly for several videos. Only frames of several clips are extracted while the others are left empty.

To reproduce the error raised by line 165, you can use the code snippet below:

import av
container = av.open("video.mp4")
videostream = container.streams.video[0]
videostream.seek(0)

Do you have any solutions for this? I'm unfamiliar with Pyav so please forgive me if I made any mistakes.

jiachenlei commented 2 years ago

version of packages: python 3.9.12 pyav 9.2.0

abrhamkg commented 2 years ago

@jasonrayshd It seems the seek function was removed from the latest versions of pyav Try downgrading pyav to 8.0.3 and it should work.

jiachenlei commented 2 years ago

Hi thank you for your reply @abrhamkg (1) after I downgraded pyav to 8.0.3, I got a new error: MethodDeprecationWarning: VideoStream.seek is deprecated. And still, no frames are saved now. I solved it by rewriting with opencv module

(2) Follow: "I comment lines from 164 to 171 and it only works properly for several videos. Only frames of several clips are extracted while the others are left empty."

I found out only clips whose clip_start_frame equals parent_start_frame are properly handled. clip_start_frame and parent_start_frame are meta information provided in fho_oscc-pnr_train.json which can be downloaded via python -m ego4d.cli.cli --output_directory="~/ego4d_data" --datasets annotations

And it turns out codes start from line 55: https://github.com/EGO4D/hands-and-objects/blob/c207485ebb501b5e1088f1ab457b258e61985d54/state-change-localization-classification/i3d-resnet50/datasets/StateChangeDetectionAndKeyframeLocalisation.py#L55

It use meta info starts with "parent", so I changed codes to use info start with "clip" and everything works fine now.

FrankFcc commented 2 years ago

jasonrayshd

Hi @jasonrayshd , I'm sorry to interrupt since I faced the same issues. Could you have any chances sharing your code about how exactly you modify in point (1) and (2)?

jiachenlei commented 2 years ago

@FrankFcc hi (1) To use the official dataset code, the version of pyav should be no greater than 6.0.0 but after some experiments I found it might cause memory leakage. That is , with more and more clips/videos are processed, the memory increases linearly without any release Thus I suggest use pyav==8.0.3 and modify code like

video_stream.seek(seek_pts)

to

container.seek(seek_pts, video_stream)

(2) In fact, the official dataset code in this repo is used to read from full scale videos (namely parent video) instead of clips, thus you have to download videos by specifying --datasets full_scale in download command. parent_start_frame in annotation file indicates the starting frame of the clip in the full scale video.

Hope these will answer your questions

FrankFcc commented 2 years ago

@jasonrayshd Thanks for your quick answer. I did actually fix the issue for container issue and I used the full-scale videos instead of clips so that it successfully notify the Number of clips for train: 41085 and the Number of clips for val: 28348. However, it then failed with a bug with "av/error.pyx" showing

"File "av/error.pyx", line 78, in av.error.FFmpegError.__init__
TypeError: __init__() takes at least 3 positional arguments (2 given)"

Could I get your requirement.txt to ensure that every libraries I installed is correct since I am currently using torch 1.8.1 for this project because detectron2 seemed to stop supporting torch 1.7.x.

jiachenlei commented 2 years ago

@FrankFcc hi This error is possibly caused by corrupted videos, no permission for reading videos or missing videos. It has nothing to do with package compatibility. You might need to traverse all videos in full_scale first. E,g by using code below:


import json
import av
from tqdm import tqdm

path = "path/to/ego4d/v1"
anno_path = path + "/annotations/fho_oscc-pnr_val.json"
video_path = path + "/full_scale"
clips = json.load(open(anno_path, "r"))["clips"]
for clip in tqdm(clips):
    video = video_path + "/"+ str(clip["video_uid"]) + ".mp4"
    container = av.open(video)

FrankFcc commented 2 years ago

@jasonrayshd Thanks for your idea and it definitely worth a try. I will firstly traverse all videos to see if there are any problems or not with the dataset that I've downloaded.

FrankFcc commented 2 years ago

@jasonrayshd It seemed that I ran all of my datasets which resulted in no corrupted videos so I might look for other issues...

jiachenlei commented 2 years ago

@FrankFcc Did you check if any videos are missing?

EGO4D / hands-and-objects

Error when reading videos with given Dataset #12