LossNAN / I3D-Tensorflow

Train I3D model on ucf101 or hmdb51 by tensorflow
Apache License 2.0
112 stars 28 forks source link

('i','x','y' folder) #1

Open panna19951227 opened 6 years ago

panna19951227 commented 6 years ago

how this data look like?('i','x','y' folder) 2018-10-05 19-55-22

LossNAN commented 6 years ago

@panna19951227 means that you should preprocess your data like this: ~PATH/UCF-101/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01/i for all rgb frames in this packge ~PATH/UCF-101/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01/x for all x_flow frames in this packge ~PATH/UCF-101/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01/y for all y_flow frames in this packge

oyvindM-B commented 5 years ago

Thank you so much for sharing your scripts, @LossNAN .

I`m a bit confused by the "i, x and y" as well. According to the UFC flow and rgb training scripts they both read from the same "train_flow.list" during training. This indicates that both flow ang rgb frames are in the same folder. If so, what does the "i" represent in your above answer: a tensor of all the images, or do you have an "i_1, i_2, i_3...i_k" where k represent the number of images in the video?

Would really appreciate if you could help me here.

LossNAN commented 5 years ago

@kiakia2 please refer to function in input_data.py:

def get_frames_data(filename, num_frames_per_clip, sample_rate, add_flow):
    ''' Given a directory containing extracted frames, return a video clip of
    (num_frames_per_clip) consecutive frames as a list of np arrays '''
    #print(filename)
    filename_i = os.path.join(filename, 'i')
    #print(filename)
    rgb_ret_arr, s_index = get_data(filename, num_frames_per_clip, sample_rate)
    if not add_flow:
        return rgb_ret_arr, [], s_index
    filename_x = os.path.join(filename, 'x')
    flow_x, _ = get_data(filename_x, num_frames_per_clip, sample_rate, s_index)
    flow_x = np.expand_dims(flow_x, axis=-1)
    filename_y = os.path.join(filename, 'y')
    flow_y, _ = get_data(filename_y, num_frames_per_clip, sample_rate, s_index)
    flow_y = np.expand_dims(flow_y, axis=-1)
    flow_ret_arr = np.concatenate((flow_x, flow_y), axis=-1)
    return rgb_ret_arr, flow_ret_arr, s_index

this func can answer you first question , it means: all data in same floder :~PATH/UCF-101/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01 and: rgb_image:~PATH/UCF-101/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01/i x_flow_image:~PATH/UCF-101/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01/x y_flow_image:~PATH/UCF-101/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01/y

oyvindM-B commented 5 years ago

Thank you so much for your prompt reply, @LossNAN ! Im so sorry, but this part is still a bit unclear to me. If "i", "x" and "y" are not subfolders of for example "v_ApplyEyeMakeup_g01_c01" how do you seperate between the idividual "i", "x" and "y" files for each image in the folder?

should the indivial files be named something like this: "i_1, i_2, i_3...i_k" "x_1, x_2, x_3...x_k" "y_1, y_2, y_3...y_k" where k is the number of images in the video clip?

Again sorry for being ignorant here :)

I also have another question. It is not regarding the "x",y" and "i", so I apologize if I should have created a new thread for this question, but here it goes:

During training the i3d authors used an input size of 64 frames, correct? When I look at "input_data" it looks like you downsample the input to be 16 frames in "sample_data" (sample_rate factor of 4)

def sample_data(ori_arr, num_frames_per_clip, sample_rate):
    ret_arr = []
    for i in range(int(num_frames_per_clip/sample_rate)):
        ret_arr.append(ori_arr[int(i*sample_rate)])
    return ret_arr

Could you please explain the reason for this? Does not using another input size than the authors of i3d used in the pretraining effect the benefits of doing pretraining?

Thanks again for your help.

ilkarman commented 5 years ago

Regarding the folder-structure, imagine you have a video:

/largedata/i3d/videos/brush_hair/brushing_hair_brush_hair_f_nm_np2_ba_goo_2.avi You want to have:

/largedata/i3d/videos/brush_hair/brushing_hair_brush_hair_f_nm_np2_ba_goo_2/i/frame0001.jpg
/largedata/i3d/videos/brush_hair/brushing_hair_brush_hair_f_nm_np2_ba_goo_2/i/frame0002.jpg
/largedata/i3d/videos/brush_hair/brushing_hair_brush_hair_f_nm_np2_ba_goo_2/i/frame0003.jpg
/largedata/i3d/videos/brush_hair/brushing_hair_brush_hair_f_nm_np2_ba_goo_2/x/frame0001.jpg
/largedata/i3d/videos/brush_hair/brushing_hair_brush_hair_f_nm_np2_ba_goo_2/x/frame0002.jpg
/largedata/i3d/videos/brush_hair/brushing_hair_brush_hair_f_nm_np2_ba_goo_2/x/frame0003.jpg
/largedata/i3d/videos/brush_hair/brushing_hair_brush_hair_f_nm_np2_ba_goo_2/y/frame0001.jpg
/largedata/i3d/videos/brush_hair/brushing_hair_brush_hair_f_nm_np2_ba_goo_2/y/frame0002.jpg
/largedata/i3d/videos/brush_hair/brushing_hair_brush_hair_f_nm_np2_ba_goo_2/y/frame0003.jpg

Then I think you just need to change this line:

rgb_ret_arr, s_index = get_data(filename, num_frames_per_clip, sample_rate) To:

rgb_ret_arr, s_index = get_data(filename_i, num_frames_per_clip, sample_rate)