Open mostafa610 opened 2 years ago
and how did the file mean_pts3d.npy created is that the mean of this point across the dataset ?
it is the average of landmarks of all video frames of the target person
Hope the above helps.
thank you so much for your reply it really helped
or do you mean that you will send batches of one each batch contain sample with 240 seq_length
thanks in advance
Of course the latter one. LSTM is a kind of RNN network, and it should take sequential data as input. 240 frames equal to 4 seconds under the 60 FPS setting.
Batch_size means for each forward pass, how many batches of sequential data (240 frames data) are sent.
thank you so much i don't know how to thank you you really helped me !
I have another question
regarding the training I understand that every sequence of 240 frame (4 sec) will output vector of size (25,3) this vector represents the displacement between the landmarks of the last frame and the mean position of landmarks is that right ?
if it is right, do you then walk through the data with a window i.e from frame zero to frame 240 from frame 1 to frame 240 . . . . from frame 39 to frame 279 this is the first patch for example is that right ?
and here A2Lsamples = self.audio_features[file_index][current_frame 2 : (current_frame + self.seq_len) 2] i don't get why (*2) thanks in advance
First, LSTM takes sequential data as input and its output is also sequential, therefore T frames input results in T frames output. Please carefully check the definition of LSTM networks. During training, we use 4 seconds as the length while during the test there is no length limitation.
Secondly, the audio2mouth network learns the displacements.
Thirdly, frame2* is because the APC feature frame is half of the 1/60. Please check the paper for details.
thank you for your replays, I am wondering do you know any open source algorithm for face tracking as i can't find one to produce the same output of your paper. thanks in advance
Any parametric monocular face reconstruction method would be an alternative, like FaceScape, DECA, 3DDFA_v2, etc.
Any parametric monocular face reconstruction method would be an alternative, like FaceScape, DECA, 3DDFA_v2, etc.
What the method did you use? could you please upload the code?
first of all thank you so much for your marvelous work. second, regarding face tracking why do use it why don't you just extract the landmarks from every frame by the landmark detector thanks in advance