diffusion model training (stage 1) with target features

NicolasWinckler commented 3 months ago

Hi,

thanks for the paper and code. I have a few questions. Do you have the code to train the diffusion model with MAE features? How long did it take to train this part? on what hardware?

Concerning the paper, you explain that conditionning on layout is better with controlnet than with cross-attention, but you show in figure 2 stage 2 an input to cross attention. What does that mean?

Thanks for your help

illrayy commented 3 months ago

Thank you for your interest in our work.

The training of diffusion is divided into two parts, training VAE first and then training diffusion. The training of each part takes about a week on a single v100, and half the time on one a100. If you wish train these two parts from scratch, You can modify the config in train_wheat.py and set resume_path to None. To train the VAE, you should also run pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers to intsall taming. Please note that we have not tested the compatibility of training these two models in this repository.

Layout condition is inserted by "+", while the crossing attention is used for domian features condition.

NicolasWinckler commented 3 months ago

Thanks a lot for the clarification

UTokyo-FieldPhenomics-Lab / DODA

diffusion model training (stage 1) with target features #1