POWERBEV, a novel and elegant vision-based end-to-end framework that only consists of 2D convolutional layers to perform perception and forecasting of multiple objects in BEVs.
In the function, when t is 1, flow[:, 0] represents the transformation matrix from timestep 0 to timestep 1, and cum_flow represents the transformation matrix from timestep 1 to timestep 2. I'm wondering if it should be cum_flow @ flow[:, t - 1] instead, assuming the input x has timesteps 0, 1, and 2.
def cumulative_warp_features(x, flow, mode='nearest', spatial_extent=None):
""" Warps a sequence of feature maps by accumulating incremental 2d flow.
x[:, -1] remains unchanged
x[:, -2] is warped using flow[:, -2]
x[:, -3] is warped using flow[:, -3] @ flow[:, -2]
...
x[:, 0] is warped using flow[:, 0] @ ... @ flow[:, -3] @ flow[:, -2]
Args:
x: (b, t, c, h, w) sequence of feature maps
flow: (b, t, 6) sequence of 6 DoF pose
from t to t+1 (only uses the xy poriton)
"""
sequence_length = x.shape[1]
if sequence_length == 1:
return x
flow = pose_vec2mat(flow)
out = [x[:, -1]]
cum_flow = flow[:, -2]
for t in reversed(range(sequence_length - 1)):
out.append(warp_features(x[:, t], mat2pose_vec(cum_flow), mode=mode, spatial_extent=spatial_extent))
# @ is the equivalent of torch.bmm
cum_flow = flow[:, t - 1] @ cum_flow
return torch.stack(out[::-1], 1)
Thank you for your kindness answer.
I have another question.
In the function, when t is 1, flow[:, 0] represents the transformation matrix from timestep 0 to timestep 1, and cum_flow represents the transformation matrix from timestep 1 to timestep 2. I'm wondering if it should be cum_flow @ flow[:, t - 1] instead, assuming the input x has timesteps 0, 1, and 2.