chensong1995 / E-CIR

Event-Enhanced Continuous Intensity Recovery (CVPR 2022)
MIT License
42 stars 2 forks source link

Some questions a about the CurveIntegrator code. #5

Open chenkang455 opened 1 year ago

chenkang455 commented 1 year ago
    integral = 2 * coeffs[:, 1] / 3
    for i in range(3, n_deg, 2):
        integral = integral + 2 * coeffs[:, i] / (i + 2)
    baseline = (2 * blurry[:, 0] - integral) / 2
    baseline = baseline.unsqueeze(dim=1)
    coeffs = torch.cat([baseline, coeffs], dim=1) # [bs, n_deg+1, h, w]
    return coeffs, integrator_cache

The above code is found on the CurveIntegrator. I have no idea about how this code works. For example [ integral = 2 * coeffs[:, 1] / 3],i donnot find the correspoding formula in your paper. I would appreciate it if you could solve my question! Thanks a lot!

chenkang455 commented 1 year ago

class FrameConstructor(nn.Module): def init(self): super(FrameConstructor, self).init()

def forward(self, coeffs, timestamps):
    # coeffs: [bs, n_deg+1, h, w]
    # timestamps: [bs, n_ts, h, w] or [bs, n_ts]
    n_deg = coeffs.shape[1] - 1
    n_ts = timestamps.shape[1]
    # torch.unsqueeze 增加一个维度
    if len(timestamps.shape) == 2:
        timestamps = timestamps.unsqueeze(-1).unsqueeze(-1)
    # bases: [bs, n_deg+1, n_ts, h, w]
    bases = torch.stack([timestamps ** i for i in range(n_deg + 1)], dim=1)
    recon = coeffs.unsqueeze(2) * bases
    recon = torch.sum(recon, dim=1)
    return recon

Besides , i donnot understand the function of the FrameConstructor, which is shown on the above. Thanks a lot!!!

chensong1995 commented 1 year ago

Hello chenkang455,

Thanks for your interest in our work!

We approximate the intensity of each pixel using a parametric polynomial function. Given an hxw pixel grid, the video is represented as hxw=180*240=43200 different polynomials. We use the symbol L_{xy}(t) in the paper to refer to the polynomial function associated with the pixel whose coordinates are (x, y). The function takes one single input: the timestamp. The function returns one single value: the intensity. To render a video frame at a particular timestamp, say t_0=0.03, we substitute t=t_0=0.03 in all 43200 polynomials. This gives 43200 different intensities, and we can assemble them into a grayscale frame with a resolution of 180x240. We can then render a few more frames at other timestamps, t_1, t_2, ..., and all these frames make up the video describing the motion of interest.

In CurveIntegrator, the forward method takes three positional arguments: derivative, blurry, keypoints. blurry is the input blurry image. Its dimensions are (batch_size, 1, h, w). The dimensions of derivative and keypoints are both (batch_size, num_kpts, h, w). For the i^{th} example in the batch, the derivative of the intensity, or dL_{xy}(t)/dt, for all pixels (x, y), must go through num_kpts points on the 2D plane. The coordinates of these points are (keypoints[i, j, x, y], derivative[i, j, x, y]) for 0 <= j < num_kpts. This corresponds to Equation (6) in the paper. On this line, we use a pre-calculated tensor called integrator_cache to transform the coefficients into the standard bases: dL_{xy}(t)/dt = c_0 + 1/2 * c_1 * t + 1/3 * c_2 * t^2 + 1/4 * c_3 * t^3 ..., where c_j = coeffs[i, j, x, y] for the i^{th} example in the batch. Taking the indefinite integral, we have L_{x, y} = c_0 * t + c_1 * t^2 + c2 * t^3 + ... + a, where a is the constant "baseline" created as a by-product of the indefinite integral. Recall that in Equation (3), we state that the definite (not indefinite) integral over [-T/2, T/2], when divided by T, is equal to the input blurry pixel B_{xy}. This is a sufficient constraint for us to analytically solve for the value a. In our experiments, we set T = 2, which means -T/2 = -1 and T/2= 1. The definite integral is then given as \int_{-1}^{1} L_{x, y} = (1/2 * c_0 * t^2 + 1/3 * c_1 * t^3 + 1/4 * c2 * t^4 + ... + a * t) |_{t=-1}^{t=1} = 2/3 * c_1 + 2/5 * c_3 + 2/7 * c_5 + ... + 2 * a. One-half of that will be 1/3 * c_1 + 1/5 * c_3 + 1/7 * c_5 + ... + a. Since this is equal to B_{xy}, we have a = B_{xy} - (1/3 * c_1 + 1/5 * c_3 + 1/7 * c_5 + ...).

As for FrameConstructor, this class allows us to render frames from the polynomial coefficients (coeffs) at specified timestamps.

I hope this helps! I encourage you to follow the derivation step by step with pen and paper and cross-check the results using a debugger in Python. Let me know if you have further questions.

Best, Chen

chenkang455 commented 1 year ago

Thank you for your detailed answer, which is very helpful to me. I've understood how the code works. Thanks a lot!

weimengting commented 1 year ago

The explanation is very elaborate. Thank you very much!