Amazing work,what about
'To leverage both data types, we treat single images as one-frame video clips and train the model on both images and videos.'
when treat a single image as a clip in training stage 1 ,How to differentiate between source and driver images,Or does the 60K images not learn the differences in stage1?
Amazing work,what about 'To leverage both data types, we treat single images as one-frame video clips and train the model on both images and videos.'
when treat a single image as a clip in training stage 1 ,How to differentiate between source and driver images,Or does the 60K images not learn the differences in stage1?