artest08 / LateTemporalModeling3DCNN

MIT License
159 stars 52 forks source link

Frame extraction and train-val split #3

Open afonso-sousa opened 4 years ago

afonso-sousa commented 4 years ago

Hello. First of all, congratulations on your work.

I want to reproduce your experiments, but I am stuck on the data preprocessing phase. The original dataset comes as ".avi" videos, how do you extract the RGB and optical flow? How much frames? And how do you convert from the original file-per-class split files to the structure you have in the "datasets/settings/hmdb51/" folder?

Disclaimer: I am just now getting started with action recognition research, this might be standard procedure for any other implementation using the HMDB51 dataset, but I am not yet familiarized with it.

I appreciate any help you can provide.

artest08 commented 4 years ago

Hello, thank you for your interest.

Basically, for RGB modality, the avi format is converted to jpg images. This can be implemented with OpenCV, for instance.

However you dont need to convert the avi format to jpg yourself. There is a Github repository where you can find the images of RGB and Optical Flow frames of both HMDB51 and UCF101: https://github.com/feichtenhofer/twostreamfusion In this repository, there is also a link to TV-L1 optical flow extraction tool. For TV-L1 extraction, there is another Github repository: https://github.com/bryanyzhu/two-stream-pytorch/tree/master/datasets However, I suggest you to use the extracted optical flow frames in the beginning because it is not an easy process.

For the setting file, I want to explain it from an example.
50_FIRST_DATES_kick_f_cm_np1_ba_med_19 48 20 In this line, the first part is the name of the video. The second part is the number of frames and third part is the label of the class.

As I mention in the repo, the folder namely \hmdb51_frames should be in the \datasets folder. And for the example above, there is a need a folder namely _50_FIRST_DATES_kick_f_cm_np1_ba_med19 in \hmdb51_frames. And the image format should be img_00001.jpg, img_00002.jpg for RGB format. For optical flow flow_x_00001, flow_x_00002 and flow_y_00001 and flow_x_00002.

Additionally, If you want to utilize my repo directly, I might send the frame folder to you. However, you need to provide me with some cloud link so that I can upload the folder.

I hope I make it clear.

fasa1395 commented 4 years ago

Hello. Congratulations for your work. I'm trying to reproduce your experiments, but I have some problems with setting up the dataset. I am just now getting started with action recognition research and it is the first time that I used this Dataset. Can you provide me the frame folder? I appreciate your help and your courtesy.

wjtan99 commented 4 years ago

Hi, I downloaded the UCF101 dataset at https://github.com/feichtenhofer/twostreamfusion. I found that they simply extract every frame out of all videos. The number of frames per video is varying, with a minimum of 29 and a maximum 1799. In your paper and code, it seems you used a fixed number of 64 frames for every video. How did you get these 64 frames for different lengths of videos? Thanks.

wjtan99 commented 4 years ago

@artest08 Hi, here is my cloud address https://drive.google.com/drive/folders/1uPevf5Xz3FLDoOoTxGwn61dUG1wcb91Q?usp=sharing. Can you upload your folder? Thanks a lot.

wjtan99 commented 4 years ago

Never mind. I got everything running now.
Thanks for sharing your great work.

hushunda commented 3 years ago

Never mind. I got everything running now. Thanks for sharing your great work.

can you share your solution?i meet same problem.

wjtan99 commented 3 years ago

@hushunda I downloaded UCF101 datasets from the UCF website, then I used ffmpeg to extract every frame of every video, put them in the folder structure as described above. Then change the image name in dataset/ucf101.py if necessary.
Here is an example of ffmpeg command I used:

    command = ['ffmpeg', '-i', video_path,
               '-y',  '-f', 'image2',  '-c:v mjpeg',  
               video_type_path + '{0}/{0}_%4d.jpg'.format(video_id)] 

    os.system(' '.join(command))

Change the video_path (input) and video_typepath + '{0}/{0}%4d.jpg'.format(video_id) (output)

RobertLee0522 commented 3 years ago

@hushunda I downloaded UCF101 datasets from the UCF website, then I used ffmpeg to extract every frame of every video, put them in the folder structure as described above. Then change the image name in dataset/ucf101.py if necessary. Here is an example of ffmpeg command I used:

    command = ['ffmpeg', '-i', video_path,
               '-y',  '-f', 'image2',  '-c:v mjpeg',  
               video_type_path + '{0}/{0}_%4d.jpg'.format(video_id)] 

    os.system(' '.join(command))

Change the video_path (input) and video_typepath + '{0}/{0}%4d.jpg'.format(video_id) (output)

Hello, thank you for your interest.

Basically, for RGB modality, the avi format is converted to jpg images. This can be implemented with OpenCV, for instance.

However you dont need to convert the avi format to jpg yourself. There is a Github repository where you can find the images of RGB and Optical Flow frames of both HMDB51 and UCF101: https://github.com/feichtenhofer/twostreamfusion In this repository, there is also a link to TV-L1 optical flow extraction tool. For TV-L1 extraction, there is another Github repository: https://github.com/bryanyzhu/two-stream-pytorch/tree/master/datasets However, I suggest you to use the extracted optical flow frames in the beginning because it is not an easy process.

For the setting file, I want to explain it from an example. 50_FIRST_DATES_kick_f_cm_np1_ba_med_19 48 20 In this line, the first part is the name of the video. The second part is the number of frames and third part is the label of the class.

As I mention in the repo, the folder namely \hmdb51_frames should be in the \datasets folder. And for the example above, there is a need a folder namely _50_FIRST_DATES_kick_f_cm_np1_ba_med19 in \hmdb51_frames. And the image format should be img_00001.jpg, img_00002.jpg for RGB format. For optical flow flow_x_00001, flow_x_00002 and flow_y_00001 and flow_x_00002.

Additionally, If you want to utilize my repo directly, I might send the frame folder to you. However, you need to provide me with some cloud link so that I can upload the folder.

I hope I make it clear.

I downloaded the HMDB51 data set from https://github.com/feichtenhofer/twostreamfusion and placed it in the folder \datasets\hmdb51_frames.Did I need to split the pictures to img.jpg and flow.jpg to run the program? Or the program can automatic split to img.jpg and flow.jpg。

If you can provide relevant classification information, it will be of great help to me. Thank you for sharing any information.

cbiras commented 3 years ago

Hi! I used this script https://github.com/kenshohara/3D-ResNets-PyTorch/blob/master/util_scripts/generate_video_jpgs.py, which basically uses ffmpeg as described above, to split the videos into .jpg frames. The folders are in the right order as described by @artest08, datasets/hmdb51_frames, which contains folders for every video, and the names of the images in these folders are in the right format: img_00002.jpg. The problem is tough, that at run time, it throws: 'Could not load file ./datasets/hmdb51_frames/TheLastManOnearth_run_f_cm_np1_ba_med_45/img_00001.jpg'. The file that can't be loaded is different every time I run the training script. Did any of you come across this problem? I would love any kind of help. Thank you!

yassinesamet commented 3 years ago

Hi! I used this script https://github.com/kenshohara/3D-ResNets-PyTorch/blob/master/util_scripts/generate_video_jpgs.py, which basically uses ffmpeg as described above, to split the videos into .jpg frames. The folders are in the right order as described by @artest08, datasets/hmdb51_frames, which contains folders for every video, and the names of the images in these folders are in the right format: img_00002.jpg. The problem is tough, that at run time, it throws: 'Could not load file ./datasets/hmdb51_frames/TheLastManOnearth_run_f_cm_np1_ba_med_45/img_00001.jpg'. The file that can't be loaded is different every time I run the training script. Did any of you come across this problem? I would love any kind of help. Thank you!

have you solve the problem please ??

4nuragk commented 3 years ago

Hi! I used this script https://github.com/kenshohara/3D-ResNets-PyTorch/blob/master/util_scripts/generate_video_jpgs.py, which basically uses ffmpeg as described above, to split the videos into .jpg frames. The folders are in the right order as described by @artest08, datasets/hmdb51_frames, which contains folders for every video, and the names of the images in these folders are in the right format: img_00002.jpg. The problem is tough, that at run time, it throws: 'Could not load file ./datasets/hmdb51_frames/TheLastManOnearth_run_f_cm_np1_ba_med_45/img_00001.jpg'. The file that can't be loaded is different every time I run the training script. Did any of you come across this problem? I would love any kind of help. Thank you!

have you solve the problem please ??

I faced the same issue. I think the problem is the quotes in some of the folder names.

scxiaowu commented 3 years ago

Dear sir, could you please share the data format in datasets/hmdb51_frames? Thank you !