training code? - Githubissues

alimama-creative / M3DDM-Video-Outpainting

Official repo for Hierarchical Masked 3D Diffusion Model for Video Outpainting

https://fanfanda.github.io/M3DDM/

Apache License 2.0

83 stars 6 forks source link

training code? #1

Closed xinli2008 closed 10 months ago

xinli2008 commented 10 months ago

Hello, thank you for your nice job~! I am wondering will you release the training code?

fanfanda commented 10 months ago

Our training code is primarily borrowed from the training scripts found within the diffusers. Currently, we do not have plans to open-source our training code. Thank you for your interest and understanding.

eshaanmoorjani commented 9 months ago

The results from M3DDM are super cool :). Thanks for the work!

Could you reference me to the training scripts that you modified for your implementation? I currently found a PyTorch implementation here, but it doesn't use the diffusers library. It would be nice to use diffusers

culeao commented 9 months ago

The results from M3DDM are super cool :). Thanks for the work!

Could you reference me to the training scripts that you modified for your implementation? I currently found a PyTorch implementation here, but it doesn't use the diffusers library. It would be nice to use diffusers

Thank you for your attention!

Sorry to say that we don't have plans to release a training script, but in fact, the training script is very simple, just replace our model based on the training examples in diffusers (e.g. train_text_to_image). You need to do processing on the training video data, e.g., processing to get _maskimage and mask.

eshaanmoorjani commented 9 months ago

Thanks! We rewrote the training code but for some reason, our outpaint is having a line when it outpaints downwards. Any ideas why here?

fanfanda commented 9 months ago

Thanks! We rewrote the training code but for some reason, our outpaint is having a line when it outpaints downwards. Any ideas why here?

We have not encountered this situation before. We recommend providing more details on training and inference parameters.

If you are using our inference pipeline, you can try changing the use_add_noise parameter to False in outpainting_with_random_masked_latent_inference_bidirection function.

jimmyl02 commented 9 months ago

following up here's our exact training setup:

we have a full image (4 dims), a masked image where the masked regions are all 0s (4 dims), and a mask where masked region is 1s and unmasked is 0s (1 dim)
we add noise according to the scheduler to the first 4 dimensions with a random timestep embedding
our loss objective is the noise added
we have 25% chance of it being fully masked and 10% chance of fully masked global frames for CFG
we follow the masking distribution in the paper

any advice on steps that are wrong / could be improved would be extremely helpful, thanks!

fanfanda commented 9 months ago

following up here's our exact training setup:

we have a full image (4 dims), a masked image where the masked regions are all 0s (4 dims), and a mask where masked region is 1s and unmasked is 0s (1 dim)

we add noise according to the scheduler to the first 4 dimensions with a random timestep embedding

our loss objective is the noise added

we have 25% chance of it being fully masked and 10% chance of fully masked global frames for CFG

we follow the masking distribution in the paper

any advice on steps that are wrong / could be improved would be extremely helpful, thanks!

It looks fine.

eshaanmoorjani commented 8 months ago

Another question -- how did y'all achieve a batch size of 10 per GPU? (I'm assuming batch size of 240 / 24 A100s = a batch size of 10 per GPU). Right now, we're returning 3 tensors (labels, masked frames, and masks).

fanfanda commented 8 months ago

Another question -- how did y'all achieve a batch size of 10 per GPU? (I'm assuming batch size of 240 / 24 A100s = a batch size of 10 per GPU). Right now, we're returning 3 tensors (labels, masked frames, and masks).

Yes, the batch size on each card is 10. Correspondingly, the batch dimension of each tensor is also 10. We have an engineering team that has developed a very good distributed training system, so we haven't had to focus on issues related to distributed training.