Closed xiangweifeng closed 9 months ago
Sorry, the details here were not clearly described in the paper. We used $p_1=0.35$, $p_2=0.1$ during training.
The reason that classifier-free guidance can be used in the inference stage is that we employed joint training during the training process (unconditional and conditional). Section 3.2.3 describes our inference strategy.
Hi, Fan, In your mask strategy, you have designed "mask all" some images for cfg. Then during the training, as mentioned before, do you " mask all the video" for cfg again?
Hi, Fan, In your mask strategy, you have designed "mask all" some images for cfg. Then during the training, as mentioned before, do you " mask all the video" for cfg again?
In our inference stage, we employed unconditional generation; therefore, during the training phase, we need a certain proportion of batches to not receive context information, that is, we implement the 'mask all' strategy.
Sorry, the details here were not clearly described in the paper. We used p1=0.35, p2=0.1 during training.
The reason that classifier-free guidance can be used in the inference stage is that we employed joint training during the training process (unconditional and conditional). Section 3.2.3 describes our inference strategy.
Hi Fan, does the p1=0.35 is for context information of the video π1, and p2=0.9 is for global video?
We used $p_1=0.35$ (context information not given), $p_2=0.1$ (global video not given) during training.
Hi, Fan,. The paper (3.2.3) mentioned that: One is the context information of the video π1, and the other is the global video clip π2. We jointly train the unconditional and conditional models by randomly setting π1 and π2 to a fixed null value β with probabilities π1 and π2. I cannot find the p1 and p2, can you provide the reference valuesγ
In 3.2.1, the paper mentioned that: The "mask all" strategy enables the model to perform unconditional generation, which allows us to adopt the classifier-free guidance [20] technique during the inference phase. During the inference phase, the reason of cfg can be used in reference stage is the trainning strategy that are described in 3.2.3 or the "mask all"?