Do you plan to release the code for video generation using hand-crafted motion guidance?
If not, can you share the input data format of the hand-crafted motion and initial object position?
For instance, in the following image, the initial object position is depicted as a red star, and the motion is described as a red arrow.
How are they transformed to be a network input?
Use this red-black image as it is? or split the arrow into multiple pieces and use another image for the initial object position?
Do you plan to release the code for video generation using hand-crafted motion guidance? If not, can you share the input data format of the hand-crafted motion and initial object position?
For instance, in the following image, the initial object position is depicted as a red star, and the motion is described as a red arrow.
How are they transformed to be a network input? Use this red-black image as it is? or split the arrow into multiple pieces and use another image for the initial object position?
Thank you in advance.