Open arstek131 opened 4 weeks ago
Hi, sorry for the confusion. 1. we use sin() because this kind of periodic function can both model the dynamic and static well. ( when $\beta$ is small, the point move linearly and fade away, while when $\beta$ is large, it tends to be static around $\mu$.) The dimension of $v$ is $m\/s$ and the dimension of $l$ is $s$. We parameter the $v$ in gaussians._velocity
. There's some naming confusion about gaussians.get_inst_velocity
, actually we get the $\bar{v}$ here instead of the instant velocity at a certain time.
v_map
is right. And it is normalized by accumulative opacity alpha
which is a nondimension parameter.Hi, thank you for the clarifications!
So if I get it right gaussians.get_inst_velocity
is the $\bar{v}$ (average velocity) that in the paper is defined as $\bar{v} = v \cdot \exp(-\frac{\rho}{2})$ while $v$ is gaussians._velocity
that in the paper is defined as instant velocity $v = \left. \frac{d\tilde{\mu}(t)}{dt} \right|_{t=\tau}$
So v_map
represents the rendered average velocity and not the instantaneous?
I see by debugging the code that gaussians._velocity
is a tensor torch.Size([2146010, 3])
(which I think represents for each Gaussian point the velocity in x,y,z).
Now, for each frame in the scene I've available the ground truth velocity (instant velocity) of the objects, represented as a torch tensor (H, W), where each pixel has a velocity value (basically I've the velocity map).
Do you have any suggestion about which velocity from the model I should use and how? My goal is to supervise the predicted velocity with the ground truth one I've. If you feel more comfortable, you can pm me. Thanks!
okey, I think using map of velocity which is used in temproal smoothing is more reasonable, i.e. the map of $\bar{v}$. Because we actually use $\bar{v}$ as a estimated 3D scene flow for self-supervision (temproal smoothing).
Great, thanks for you reply.
When the velocity and other features are passed to the rasterizer, in case of the velocity map, what is the meaning of the values pixelwise of the rasterized image (v_map)? Because as far as I've understood they don't represent velocity values in $m/s$, how should I interpret them?
Thanks
Why doesn't the v_map
indicates velocity in $m\/s$ (each channel)? Roughly speaking, each pixel represents the expectation of velocity on the corresponding ray (using alpha blending weights as the probability distribution).
Ok, but how should I interpret this pixel representation? For example, it is possible to recover the velocity, in $m/s$, from the rendered v_map
? If yes, how?
Such as projecting the objects' velocity as well as their masks to the camera images to get the GT v_map
label, or using depth map and back project the v_map
to the 3D space as point cloud or directly supervising the PVG points.
Hi, thank you for your nice work. I've mainly two questions, regarding the concept of velocity in you paper and implementation.
1) Could you argument more about the mean when it's time dependent? $\tilde{\mu}(t) = \mu + \frac{l}{2\pi} \cdot \sin\left( 2\pi \frac{t - \tau}{l} \right) \cdot v$ Why did you model it using sin()? What is the reason behind this choice? Also could you explain better $v = \left. \frac{d\tilde{\mu}(t)}{dt} \right|_{t=\tau}$ I got that it's the instant velocity, but how it's interpreted in the code? What is the unit of measure?
2) Regarding the code implementation In train.py at each iteration you calculate velocity like this:
v = gaussians.get_inst_velocity
Then you pass it to the render functionrender_pkg = render(viewpoint_cam, gaussians, args, background, env_map=env_map, other=other, time_shift=time_shift, is_training=True)
Once rendering is completed you get the render velocity as:feature = render_pkg['feature'] / alpha.clamp_min(EPS)
v_map = feature[1:]
And
v_map
is a torch tensor with 3 channels, and I suppose that each channel describes the instantaneous velocity of that point in the x, y, and z directions respectively. In which values this v_map is normalized? What is the unit of measure?Thanks