3d-motion-magnification / 3d-motion-mag

BSD 3-Clause "New" or "Revised" License
23 stars 5 forks source link

Implementations for "Linear – Tri-Plane", "Position Shift" and "Encoding Shift" methods #5

Closed WillYao-THU closed 4 months ago

WillYao-THU commented 4 months ago

Hi,

I wonder if the release codes include implementations for "Linear – Tri-Plane" ,"Position Shift" and "Encoding Shift" methods. It seems that only the "Phase – Tri-Plane" method is implemented.

Looking forward to your reply, thanks!

brandonyfeng commented 4 months ago

We did not plan to include them because we wanted to repo to only contain the functionality used by the method.

"Linear - Tri-Plane" is straightforward to implement. Here, we don't need the steerable pyramid, and we just compute: T1_mag = T1 + alpha * (T1 - T0) directly on the triplane values.

"Position Shift" and "Encoding Shift" depend on a non-triplane NeRF backbone (including them would make this repo too complicated). Basically, it would require training the NeRF using the original MLP setup: (x, y, z) -> PosEnc(x, y, z) -> MLP(PosEnc(x, y, z)) -> (RGB, opacity). With "Position Shift", we would only tweak (x, y, z) and freeze everything else for later timesteps. With "Encoding Shift", we would only finetune the Positional Encoding output. The finetuning essentially learns a residual term to add back to the original value. After all training, magnification is obtained by linearly amplifying the differences characterized by the residuals.

Let me know if you need further clarifications. Thanks.

WillYao-THU commented 4 months ago

Thank you so much for your detailed explanation. Your clarification on the implementation was incredibly helpful. Best regards!