Closed hw-liang closed 7 months ago
Hi I find your implementation of train_svd.py very helpful.
I don't quite understand this part. Could you give some reference or explanation to this part of implementation? Thank you! https://github.com/alibaba/animate-anything/blob/main/train_svd.py#L477 to L483
Hi I noticed in your latest implementation,
https://github.com/alibaba/animate-anything/blob/main/train_svd.py#L810
not casting text_encoder to half precision may cause datatype mismatch problem during training.
cause datatype mismatch problem during I encounter this issue too. Text-encoder should be casted into half precision and move to cuda.
I compared your implementations of 'train_svd.py' and 'train.py' and found several interesting points:
https://github.com/alibaba/animate-anything/blob/main/train_svd.py#L441 could you explain the difference between latent_dist.mode() latent_dist.sample()?
https://github.com/alibaba/animate-anything/blob/main/train_svd.py#L433 Why not pass the vae and image_encoder as arguments into this function? Is there any difference btw passing in arguments and sparing from pipeline?