Open chenkang455 opened 1 year ago
class FrameConstructor(nn.Module): def init(self): super(FrameConstructor, self).init()
def forward(self, coeffs, timestamps):
# coeffs: [bs, n_deg+1, h, w]
# timestamps: [bs, n_ts, h, w] or [bs, n_ts]
n_deg = coeffs.shape[1] - 1
n_ts = timestamps.shape[1]
# torch.unsqueeze 增加一个维度
if len(timestamps.shape) == 2:
timestamps = timestamps.unsqueeze(-1).unsqueeze(-1)
# bases: [bs, n_deg+1, n_ts, h, w]
bases = torch.stack([timestamps ** i for i in range(n_deg + 1)], dim=1)
recon = coeffs.unsqueeze(2) * bases
recon = torch.sum(recon, dim=1)
return recon
Besides , i donnot understand the function of the FrameConstructor, which is shown on the above. Thanks a lot!!!
Hello chenkang455,
Thanks for your interest in our work!
We approximate the intensity of each pixel using a parametric polynomial function. Given an hxw
pixel grid, the video is represented as hxw=180*240=43200
different polynomials. We use the symbol L_{xy}(t)
in the paper to refer to the polynomial function associated with the pixel whose coordinates are (x, y)
. The function takes one single input: the timestamp. The function returns one single value: the intensity. To render a video frame at a particular timestamp, say t_0=0.03
, we substitute t=t_0=0.03
in all 43200
polynomials. This gives 43200
different intensities, and we can assemble them into a grayscale frame with a resolution of 180x240
. We can then render a few more frames at other timestamps, t_1, t_2, ...
, and all these frames make up the video describing the motion of interest.
In CurveIntegrator
, the forward
method takes three positional arguments: derivative
, blurry
, keypoints
. blurry
is the input blurry image. Its dimensions are (batch_size, 1, h, w)
. The dimensions of derivative
and keypoints
are both (batch_size, num_kpts, h, w)
. For the i^{th}
example in the batch, the derivative of the intensity, or dL_{xy}(t)/dt
, for all pixels (x, y)
, must go through num_kpts
points on the 2D plane. The coordinates of these points are (keypoints[i, j, x, y], derivative[i, j, x, y])
for 0 <= j < num_kpts
. This corresponds to Equation (6) in the paper. On this line, we use a pre-calculated tensor called integrator_cache
to transform the coefficients into the standard bases: dL_{xy}(t)/dt = c_0 + 1/2 * c_1 * t + 1/3 * c_2 * t^2 + 1/4 * c_3 * t^3 ...
, where c_j = coeffs[i, j, x, y]
for the i^{th}
example in the batch. Taking the indefinite integral, we have L_{x, y} = c_0 * t + c_1 * t^2 + c2 * t^3 + ... + a
, where a
is the constant "baseline" created as a by-product of the indefinite integral. Recall that in Equation (3), we state that the definite (not indefinite) integral over [-T/2, T/2]
, when divided by T
, is equal to the input blurry pixel B_{xy}
. This is a sufficient constraint for us to analytically solve for the value a
. In our experiments, we set T = 2
, which means -T/2 = -1
and T/2= 1
. The definite integral is then given as \int_{-1}^{1} L_{x, y} = (1/2 * c_0 * t^2 + 1/3 * c_1 * t^3 + 1/4 * c2 * t^4 + ... + a * t) |_{t=-1}^{t=1} = 2/3 * c_1 + 2/5 * c_3 + 2/7 * c_5 + ... + 2 * a
. One-half of that will be 1/3 * c_1 + 1/5 * c_3 + 1/7 * c_5 + ... + a
. Since this is equal to B_{xy}
, we have a = B_{xy} - (1/3 * c_1 + 1/5 * c_3 + 1/7 * c_5 + ...)
.
As for FrameConstructor
, this class allows us to render frames from the polynomial coefficients (coeffs
) at specified timestamps
.
I hope this helps! I encourage you to follow the derivation step by step with pen and paper and cross-check the results using a debugger in Python. Let me know if you have further questions.
Best, Chen
Thank you for your detailed answer, which is very helpful to me. I've understood how the code works. Thanks a lot!
The explanation is very elaborate. Thank you very much!
The above code is found on the CurveIntegrator. I have no idea about how this code works. For example [ integral = 2 * coeffs[:, 1] / 3],i donnot find the correspoding formula in your paper. I would appreciate it if you could solve my question! Thanks a lot!