DeepMotionEditing / deep-motion-editing

An end-to-end library for editing and rendering motion of 3D characters with deep learning [SIGGRAPH 2020]
BSD 2-Clause "Simplified" License
1.56k stars 256 forks source link

style transfer architecture #134

Closed Hellodan-77 closed 3 years ago

Hellodan-77 commented 3 years ago

Did you extract features separately from content movement and style movement in the process of style transfer? Or should the style feature of the latent code of the content be decoded directly through instance normalization (IN), and the style motion is added to the decoded output movement through AdaIN during the decoding process? Which py file is the code part of this part? I don’t understand a little in this part. I would like to trouble you to explain it. Thank you very much!

HalfSummer11 commented 3 years ago

The content feature is extracted by the content encoder while the style feature is extracted by the style encoder (I guess this is what you mean by "separately"). IN serves as part of the content encoder to strip out style. Basically, there shouldn't be a "style feature" of the content code since it's style-neutral. I think the paper or the video would illustrate our method more clearly. The most relevant part in the code is here.

Hellodan-77 commented 3 years ago

Thank you very much for your reply! My understanding is that you separate the content motions and style motions for feature extraction. For the content motions, the style part is stripped away and the content part is retained, while for the style motions, the style part is retained, and then the content and style processed are combined to get the transferd motions (that is, the output motion)? If so, how do you extract content and style separately for the migration process of style motion as video? (By which I mean, what is the method used? Because I did not understand some things in the article) I hope to receive your reply!

HalfSummer11 commented 3 years ago

My understanding is that you separate the content motions and style motions for feature extraction. For the content motions, the style part is stripped away and the content part is retained, while for the style motions, the style part is retained, and then the content and style processed are combined to get the transferred motions (that is, the output motion)?

Yes, exactly. But I am afraid that I still can't get your question. Regarding "separately", since the content and style come from two different input motions, the two motions (no matter in 3D or video) are naturally processed "separately" during encoding. Regarding the "method", as I said, the extraction is done by the content encoder and style encoder, and the detailed architectures can be found in the paper/code.