DachunKai / EvTexture

[ICML 2024] EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
https://dachunkai.github.io/evtexture.github.io/
Apache License 2.0
982 stars 65 forks source link

DataPreparation understanding #16

Open eduardatmadenn opened 2 months ago

eduardatmadenn commented 2 months ago

Hi, first off, congratulations on your work. The use of event data for this is fascinating.

I'm trying to reproduce your work using simulated data and I have a couple of question

  1. What is the FPS you recommend for frame interpolation? As you use a B=5, would it be a safe bet to assume 5 extra frames, between every 2 frames?

  2. Am I understanding this correct? for calendar.h5, data looks like this:

    dict_keys(['images/000000', 'images/000001', 'images/000002', 'images/000003', 'images/000004', 'images/000005', 'images/000006', 'images/000007', 'images/000008', 'images/000009', 'images/000010', 'images/000011', 'images/000012', 'images/000013', 'images/000014', 'images/000015', 'images/000016', 'images/000017', 'images/000018', 'images/000019', 'images/000020', 'images/000021', 'images/000022', 'images/000023', 'images/000024', 'images/000025', 'images/000026', 'images/000027', 'images/000028', 'images/000029', 'images/000030', 'images/000031', 'images/000032', 'images/000033', 'images/000034', 'images/000035', 'images/000036', 'images/000037', 'images/000038', 'images/000039', 'images/000040', 'voxels_b/000000', 'voxels_b/000001', 'voxels_b/000002', 'voxels_b/000003', 'voxels_b/000004', 'voxels_b/000005', 'voxels_b/000006', 'voxels_b/000007', 'voxels_b/000008', 'voxels_b/000009', 'voxels_b/000010', 'voxels_b/000011', 'voxels_b/000012', 'voxels_b/000013', 'voxels_b/000014', 'voxels_b/000015', 'voxels_b/000016', 'voxels_b/000017', 'voxels_b/000018', 'voxels_b/000019', 'voxels_b/000020', 'voxels_b/000021', 'voxels_b/000022', 'voxels_b/000023', 'voxels_b/000024', 'voxels_b/000025', 'voxels_b/000026', 'voxels_b/000027', 'voxels_b/000028', 'voxels_b/000029', 'voxels_b/000030', 'voxels_b/000031', 'voxels_b/000032', 'voxels_b/000033', 'voxels_b/000034', 'voxels_b/000035', 'voxels_b/000036', 'voxels_b/000037', 'voxels_b/000038', 'voxels_b/000039', 'voxels_f/000000', 'voxels_f/000001', 'voxels_f/000002', 'voxels_f/000003', 'voxels_f/000004', 'voxels_f/000005', 'voxels_f/000006', 'voxels_f/000007', 'voxels_f/000008', 'voxels_f/000009', 'voxels_f/000010', 'voxels_f/000011', 'voxels_f/000012', 'voxels_f/000013', 'voxels_f/000014', 'voxels_f/000015', 'voxels_f/000016', 'voxels_f/000017', 'voxels_f/000018', 'voxels_f/000019', 'voxels_f/000020', 'voxels_f/000021', 'voxels_f/000022', 'voxels_f/000023', 'voxels_f/000024', 'voxels_f/000025', 'voxels_f/000026', 'voxels_f/000027', 'voxels_f/000028', 'voxels_f/000029', 'voxels_f/000030', 'voxels_f/000031', 'voxels_f/000032', 'voxels_f/000033', 'voxels_f/000034', 'voxels_f/000035', 'voxels_f/000036', 'voxels_f/000037', 'voxels_f/000038', 'voxels_f/000039']) 

where the shape of each individual voxel tensor is [B, H, W].

So in order to replicate this, should I use events_to_voxel_torch on the Event data of each real frame, individually? (When I say real frame, I mean, the original frame plus all the interpolated frames between t and t+1)

  1. Also, 'images/000000' looks to just be the LR image, saved as h5. Would it be enough to open the image as a np array and save it without any other processing? Maybe I missed this detail, but I don't see any processing involved in this DataPreparation step, for the LR images

Thank you

DachunKai commented 2 months ago
  1. For the video's meta information, if you know the original FPS, use that FPS. If not, you can set it to 25 or 30 FPS. For better understanding, you can assume B=5, which means there are 5 extra frames between two frames. However, these extra frames come from event signals, and they are still different in form from frame signals.
  2. The images in dict_keys are the frames from the original video and do not include the interpolated frames. The interpolated frames are only used to simulate better event signals and are not packaged into the h5 file. So you should use events_to_voxel_torch to convert the events between two original frames into voxels.
  3. The LR image is downsampled from the GT image. We use a MATLAB script to downsample the GT image. You can refer to generate_bicubic_img.m.
eduardatmadenn commented 2 months ago

Hi, Thanks for you answer, just to clarify:

  1. I am asking how many actual interpolated frames do you use between two (real) frames
  2. In the Event Signal data, do the timestamps (first column, here) reset to 0 for every real frame, or are they continuous?
    674 571 274 0
    687 632 314 0
    706 632 313 0
    707 639 336 0
    710 639 329 0
    716 639 328 0
    718 0 294 0
    718 639 332 0
    720 639 335 0
    724 639 330 0
    724 639 331 0
    729 550 163 0
    735 550 164 0
DachunKai commented 2 months ago
  1. We interpolate 7 frames between two (real) frames for the Vimeo90k dataset, and interpolate 3 frames between two (real) frames for the REDS and Vid4 datasets using the RIFE interpolation model.
  2. I can't understand your second question. Event data consists of x, y, t, and p, where p equals +1 or -1, indicating event polarity, and t is continuous with a very small delay. Thanks!