Hi authors, I'm reading your paper and I see the result very good. But I still not clear about some information in paper:
In the above image, Can you tell me what axis are, and based on my knowledge, deep features is 1D vector, so what formula you use to convert 1D vector to a number in [-0.15;0.1].
Why content motion is shape of signal and style in scale and bias of signal?
Can you explain more about why he temporal 1D convolution return the unit quaternions tend to be smooth? Based on my knowledge, I just know temporal 1D convolutonal network is famous for time-series, but I can't find the reason why it return continuites number? It is fine for me if you just put a reference to explain this ^^
I would be grateful if you can explain my concerns to make me more understand about your great work.
Hi, thanks for your interest in our work! Regarding your questions:
As the caption suggests, we are visualizing a specific channel of the 1D vector deep feature, so it's a number. Here the y axis is the temporal axis. (Basically, we have a feature matrix C T, where C is the number of channels and T is the temporal length. We visualize one row 1 T.)
This is an intuitive explanation of what the network does, just as image style transfer works claim/assume the Gram matrix/the second-order statistics correspond to the image style. For input/output signals, each channel corresponds to the x/y/z coordinate of a specific joint. The shape of the signal tells us how this joint moves in space (e.g. the wrist joint may move back and forth during walking), and the bias/scale of the signal gives the range of the joint motion, which is more related to style, e.g. a big motion range suggests an energetic style. (see our video for a visual explanation). Furthermore, in our framework, we explicitly inject style information by manipulating the scale/bias of the signal, while the shape of the signal comes from the content code. This is the way we model the problem.
We think this more of our empirical experience with 1D convolutions on motion data -- a "smoothing kernel" is often learned, leading to continuous outputs, e.g. footskate artifacts.
Hi authors, I'm reading your paper and I see the result very good. But I still not clear about some information in paper:
I would be grateful if you can explain my concerns to make me more understand about your great work.