question about some implementation details

alibaba / animate-anything

Fine-Grained Open Domain Image Animation with Motion Guidance

https://animationai.github.io/AnimateAnything/

MIT License

779 stars 63 forks source link

Closed hw-liang closed 7 months ago

hw-liang commented 10 months ago

I compared your implementations of 'train_svd.py' and 'train.py' and found several interesting points:

https://github.com/alibaba/animate-anything/blob/main/train_svd.py#L441 could you explain the difference between latent_dist.mode() latent_dist.sample()?

https://github.com/alibaba/animate-anything/blob/main/train_svd.py#L433 Why not pass the vae and image_encoder as arguments into this function? Is there any difference btw passing in arguments and sparing from pipeline?

hw-liang commented 10 months ago

Hi I find your implementation of train_svd.py very helpful.

I don't quite understand this part. Could you give some reference or explanation to this part of implementation? Thank you! https://github.com/alibaba/animate-anything/blob/main/train_svd.py#L477 to L483

hw-liang commented 9 months ago

Hi I noticed in your latest implementation,

not casting text_encoder to half precision may cause datatype mismatch problem during training.

alexhe101 commented 9 months ago

cause datatype mismatch problem during I encounter this issue too. Text-encoder should be casted into half precision and move to cuda.