Closed xinli2008 closed 10 months ago
Our training code is primarily borrowed from the training scripts found within the diffusers. Currently, we do not have plans to open-source our training code. Thank you for your interest and understanding.
The results from M3DDM are super cool :). Thanks for the work!
Could you reference me to the training scripts that you modified for your implementation? I currently found a PyTorch implementation here, but it doesn't use the diffusers library. It would be nice to use diffusers
The results from M3DDM are super cool :). Thanks for the work!
Could you reference me to the training scripts that you modified for your implementation? I currently found a PyTorch implementation here, but it doesn't use the diffusers library. It would be nice to use diffusers
Thank you for your attention!
Sorry to say that we don't have plans to release a training script, but in fact, the training script is very simple, just replace our model based on the training examples in diffusers (e.g. train_text_to_image). You need to do processing on the training video data, e.g., processing to get _maskimage and mask.
Thanks! We rewrote the training code but for some reason, our outpaint is having a line when it outpaints downwards. Any ideas why here?
Thanks! We rewrote the training code but for some reason, our outpaint is having a line when it outpaints downwards. Any ideas why here?
We have not encountered this situation before. We recommend providing more details on training and inference parameters.
If you are using our inference pipeline, you can try changing the use_add_noise parameter to False in outpainting_with_random_masked_latent_inference_bidirection function.
following up here's our exact training setup:
any advice on steps that are wrong / could be improved would be extremely helpful, thanks!
following up here's our exact training setup:
- we have a full image (4 dims), a masked image where the masked regions are all 0s (4 dims), and a mask where masked region is 1s and unmasked is 0s (1 dim)
- we add noise according to the scheduler to the first 4 dimensions with a random timestep embedding
- our loss objective is the noise added
- we have 25% chance of it being fully masked and 10% chance of fully masked global frames for CFG
- we follow the masking distribution in the paper
any advice on steps that are wrong / could be improved would be extremely helpful, thanks!
It looks fine.
Another question -- how did y'all achieve a batch size of 10 per GPU? (I'm assuming batch size of 240 / 24 A100s = a batch size of 10 per GPU). Right now, we're returning 3 tensors (labels, masked frames, and masks).
Another question -- how did y'all achieve a batch size of 10 per GPU? (I'm assuming batch size of 240 / 24 A100s = a batch size of 10 per GPU). Right now, we're returning 3 tensors (labels, masked frames, and masks).
Yes, the batch size on each card is 10. Correspondingly, the batch dimension of each tensor is also 10. We have an engineering team that has developed a very good distributed training system, so we haven't had to focus on issues related to distributed training.
Hello, thank you for your nice job~! I am wondering will you release the training code?