Dynamic Feature Encoding

facebookresearch / AutoAvatar

AutoAvatar Autoregressive Neural Fields for Dynamic Avatar Modeling

Other

98 stars 9 forks source link

Dynamic Feature Encoding #3

Closed antonagafonov closed 1 year ago

antonagafonov commented 2 years ago

Hi guys,

In the paper " 3.2 Dynamic Feature Encoding" you explain the reasoning of inputs to UNET, among them:

signed height vectors UV maps with dimensions of (T=3) [1,3,256,256]
signed height vectors UV maps derivatives
Pose (why pose shape[1] is 32?)
Pose derivative (why this is 3 times bigger than pose?)

I see at /AutoAvatar/tree/main/models/PosedDecKNN_dPoses_dHs)/nets.py line 513 feat_uv missing H derivative.

Why C = shapes_uv.shape[1] is 64, why not 1, the goal is to predict one frame not 64, what those 64 UV maps represent?

Can you please explain what I am missing?

Thanks

antonagafonov commented 2 years ago

Regarding H derivatives I found in nets.py :

obsdf_feat = torch.cat([obsdf_delta.view(B, N, T - 1), obsdf[:, -1] * 20], dim=-1)

Where obsdf_delta is the difference which is the de-facto derivative.

So obsdf_feat is constructed from two differences and the height map multiplied by 20. Why 20 , empirically?

Thanks,

zqbai-jeremy commented 1 year ago

Sorry for my late reply.

Q: I see at /AutoAvatar/tree/main/models/PosedDecKNN_dPoses_dHs)/nets.py line 513 feat_uv missing H derivative. A: H deriavte is at line 486.

Q: Why C = shapes_uv.shape[1] is 64, why not 1, the goal is to predict one frame not 64, what those 64 UV maps represent? A: The 64 channel tensor is a neural feature that will be decoded into SDF of the next frame. The channel numbers of neural features are mannully set. Different values may be used.

Q: So obsdf_feat is constructed from two differences and the height map multiplied by 20. Why 20 , empirically? A: Yes. Just to make network input not too small.

antonagafonov commented 1 year ago

Thanks for the answers.