DachunKai / EvTexture

[ICML 2024] EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
https://dachunkai.github.io/evtexture.github.io/
Apache License 2.0
1.04k stars 69 forks source link

Questions about Event Data #10

Closed zhaohm14 closed 5 months ago

zhaohm14 commented 5 months ago

Thanks for your wanderful work! I have some questions about the event data:

I appreciate your guidance on these queries as I am looking to better understand the processes involved in working with event data. Thanks again! PS: Is there any WeChat group or Discord channel where I can discuss these topics further?

DachunKai commented 5 months ago

Source of Event Data for Training:

We trained our model on three datasets. For Vimeo-90K and REDS, the event data was generated by inputting the videos into the event simulator vid2e. For the CED dataset, the event data was captured using a physical event camera, DAVIS346. You can refer to Section 5.1 of our paper for more details.

Inference with Specific Videos:

To test the model on your own videos, you need to convert the video into event data using an event camera simulator, such as esim, vid2e, or v2e. After generating the event data, process the events and the original video together for testing.

Physical Method for Event Generation:

For event generation, I recommend using existing event simulators. You can refer to the papers on these simulators for more details. Simply calculating the change in pixel intensity between frames of a conventional video is not sufficient to fully simulate events.

Therefore, even if most of the existing videos do not have event data, it is entirely possible to generate events using these simulators and then feed both the event data and the video into our model. We haven't had the time to package this process recently, but it is completely feasible.

Thank you for your interest!

zhaohm14 commented 5 months ago

Thank you for your help! I am new to working with event data and there are still some aspects that confuse me.

I have successfully used vid2e to convert my video into .npz files, which contain four dimensions: x, y, t, and p. Could you guide me on how to convert these .npz files into voxels_b and voxels_f?

Additionally, I noticed that the dimensions in the voxels are 5x144x180. Could you explain what the "5" channels represent in this context?

DachunKai commented 5 months ago

Congratulations on successfully generating events! Regarding your first question, you can use this function events_to_voxel_torch to convert events (x, y, t, p) into voxels. This function will generate forward voxels. For backward voxels, the code would look something like this:

if backward:
    xs = torch.flip(xs, dims=[0])
    ys = torch.flip(ys, dims=[0])
    ts = torch.flip(t_end - ts + t_start, dims=[0]) # t_end and t_start represent the timestamp range of the events to be flipped, typically the timestamps of two consecutive frames.
    ps = torch.flip(-ps, dims=[0])
voxel = events_to_voxel_torch(xs, ys, ts, ps, bins, device=None, sensor_size=sensor_size)

As for your second question, the "5" channels represent the number of bins in the output voxel grids (int). You can refer to our paper, section 3.1, to understand the meaning of this parameter in detail.

zhaohm14 commented 5 months ago

Thanks! I've generated voxels, however, I found that my results differ from the datasets you provided. I'm trying to understand the workflow used to generate the event data.

  1. Workflow If the event data was generated using any of the following processes?

    • HR images -> HR events -> LR events
    • LR images -> LR events
    • LR images -> upsampled LR images -> upsampled LR events -> LR events
    • Or any other process?
  2. Parameters Used Could you provide the specific parameters used for --contrast_threshold_neg, --contrast_threshold_pos, and --refractory_period_ns in the event data generation process?

  3. Normalization Implementation I used the following code snippet to normalize the voxels. Could you review if there's anything incorrect with this implementation?

    eta = torch.quantile(voxels, 0.98)
    voxels = torch.clamp(voxels, max=eta, min=-eta) / eta
  4. Visual Comparisons Below are the visual comparisons between the event heatmap you provided in your dataset (Vid4:city:voxels_b:000011) (LEFT) and my generation using --contrast_threshold_neg=0.005, --contrast_threshold_pos=0.005, and --refractory_period_ns=0 (RIGTH).

DachunKai commented 5 months ago

Q1: WorkFlow. A1: Our event data generation follows the workflow: HR images -> HR events -> LR events. This approach is consistent with previous event-based VSR works.

Q2: Parameters Used A2: The parameters used to generate events are designed to simulate real events. In real scenes, an event camera's contrast threshold (CT) is not constant but normally distributed. We randomly sample the contrast at each simulation step according to $\mathcal{N}\left(C; \sigma_C\right)$, where $\sigma_C$ controls the amount of noise. More specifically, the positive and negative thresholds, $C_p$ and $C_n$ are sampled as follows:

import random
import esim_py

config = {
    'refractory_period': 1e-4,
    'CT_range': [0.05, 0.5],
    'max_CT': 0.5,
    'min_CT': 0.02,
    'mu': 1,
    'sigma': 0.1,
    'H': clip.height,
    'W': clip.width,
    'log_eps': 1e-3,
    'use_log': True,
}

Cp = random.uniform(config['CT_range'][0], config['CT_range'][1])
Cn = random.gauss(config['mu'], config['sigma']) * Cp
Cp = min(max(Cp, config['min_CT']), config['max_CT'])
Cn = min(max(Cn, config['min_CT']), config['max_CT'])

esim = esim_py.EventSimulator(
    Cp,
    Cn,
    config['refractory_period'],
    config['log_eps'],
    config['use_log']
)

events = esim.generateFromFolder(image_folder, timestamps_file) # Generate events with shape [N, 4]

Here, timestamps_file is user-defined. For videos with known frame rates, this file contains [0, 1.0/fps, 2.0/fps, ...]. For unknown frame rates, we assume fps = 25. Note that the image_folder contains images after frame interpolation. If you are familiar with vid2e, you know what I mean. Our frame interpolation model is RIFE, whereas vid2e uses SuperSloMo or FILM interpolation models. Our interpolation rate is x2, inserting three frames between each pair of frames.

DachunKai commented 5 months ago

Q3: Voxel normalization Implementation A3: I used the following code snippet to normalize the voxels.

def voxel_normalization(voxel):
    """
        normalize the voxel same as https://arxiv.org/abs/1912.01584 Section 3.1
        Params:
            voxel: torch.Tensor, shape is [num_bins, H, W]

        return:
            normalized voxel
    """
    # check if voxel all element is 0
    a,b,c = voxel.shape
    tmp = torch.zeros(a, b, c)
    if torch.equal(voxel, tmp):
        return voxel
    abs_voxel, _ = torch.sort(torch.abs(voxel).view(-1, 1).squeeze(1))
    # print("abs_voxel.shape: ", abs_voxel.shape)
    first_non_zero_idx = torch.nonzero(abs_voxel)[0].item()
    non_zero_voxel = abs_voxel[first_non_zero_idx:]
    norm_idx = math.floor(non_zero_voxel.shape[0] * 0.98)

    ones = torch.ones_like(voxel)

    # squeeze_voxel, indices = torch.sort(voxel.view(-1, 1).squeeze(1))
    normed_voxel = torch.where(torch.abs(voxel) < non_zero_voxel[norm_idx], voxel / non_zero_voxel[norm_idx], voxel)
    normed_voxel = torch.where(normed_voxel >= non_zero_voxel[norm_idx], ones, normed_voxel)
    normed_voxel = torch.where(normed_voxel <= -non_zero_voxel[norm_idx], -ones, normed_voxel)

    return normed_voxel

Additional Notes Due to various influencing factors, such as the interpolation model you used for event simulation and especially the random contrast threshold we specified during simulation, there may be differences between the generated events and the ones we released. I hope these answers can help you. Thank you!

zhaohm14 commented 5 months ago

Thanks for your detailed explanation. Your response has greatly helped me understand the process of handling event data. Looking forward to any possible updates and improvements in the future. Thank you again for your patience and support!