JulianGuerreiro / LayoutFlow

LayoutFlow: Flow Matching for Layout Generation [Andrade Guerreiro et al., ECCV 2024]
https://julianguerreiro.github.io/layoutflow
MIT License
19 stars 2 forks source link

LayoutFlow: Flow Matching for Layout Generation (ECCV 2024)

This repository is the official implementation of the ECCV 2024 paper "LayoutFlow: Flow Matching for Layout Generation" (project page | paper).

Requirements

We used the following environement for the experiments:

Other dependencies can be installed using pip as follows:

pip install -r requirements.txt

Overview

The code uses the PyTorch Lightning framework and manages configurations with Hydra. For logging during training, we used Weights and Biases but alternatively tensorboard can also be used by changing the logger in conf/train.yaml.

Configs

The configuration files are defined in the .yaml found in the conf folder and contain hyperparameters and other settings. The values can be changed in the .yaml files directly (which we only recommend for data paths or permanent changes) or, alternatively, can be overidden as a command line instruction. For example changing the batch size used during training can be done like this:

python src/train.py dataset=PubLayNet model=LayoutFlow dataset.batch_size=1024

Models

We provide two different generative models in src/models, namely our flow-based approach called LayoutFlow and a diffusion-based approach LayoutDMx. The main difference between both models is just the training procedure (diffusion vs. flow). The same backbone architecture in src/models/backbone can be chosen for either one of them.

Dataset

We trained our model on the RICO and PubLayNet dataset using the dataset split reported in LayoutFormer++ and LayoutDiffusion. Please download the following files from this Hugging Face repository using (make sure you have installed git lfs, otherwise the large files will not be downloaded)

git clone https://huggingface.co/JulianGuerreiro/LayoutFlow

Important:

You can store the data in a directory of your choosing, but you will need to add the datapath in the dataset config files. Specifically, change the data_path attribute in conf/dataset/PubLayNet.yaml and conf/dataset/RICO.yaml to the path where the respective folders are located.

We also provide the PubLayNet split in LayoutDM (Inoue et al.), which we used for comparison with other models as described in the Appendix (Section: Results Using Different Data Split).

Pretrained Models

The pre-trained models can be downloaded from the Hugging Face repository as described above. They can be used to evaluate the model or even continue training. Note that the .tar files in pretrained are used for the FID model and are identical to the ones used in LayoutDiffusion and do not need to be downloaded seperately. Furthermore, the .pt files are additionally used for the FID calculation.

Testing

A model can be evaluated on various tasks by calculating FID, Alignment, Overlap and mIoU. The example below shows a minimal example:

python3 src/test.py model=[MODEL] dataset=[DATASET] task=[TASK] cond_mask=[MASK] checkpoint=[DIR_TO_CHECKPOINT]

Other useful settings (see also test.yaml config file)

The results will be saved in the results directory as a .pt file. To re-evaluate the files, you can set the variable load_bbox to the path of the .pt file.

Note

Since the generation task is non-deterministic, there will be some variations in the results and it will not match the values of the paper perfectly. The provided weights are also not the original weights we used in the paper, as we re-trained the model after refactoring. Nonetheless, we evaluated the newly trained models and they were very close to the reported values after using multirun.

Training

For training, we provide the train.sh file, where you can comment out the model that you would like to train. If you want to train the model with different hyperparameters, you can change the values in the .sh file, for example add model.optimizer.learning_rate=0.0001 to change the learning rate.

We recommend using a single GPU for training as that has shown the best results under the current hyperparameters.

Useful settings

Citation

If this work is helpful for your research, please cite our paper:

@article{guerreiro2024layoutflow,
  title={LayoutFlow: Flow Matching for Layout Generation},
  author={Guerreiro, Julian Jorge Andrade and Inoue, Naoto and Masui, Kento and Otani, Mayu and Nakayama, Hideki},
  journal={arXiv preprint arXiv:2403.18187},
  year={2024}
}

Acknowledgments

We want to acknowledge that some parts of our code (mainly some utils functions for the evaluation) are based on code used in the following projects: LayoutDiffusion and LayoutDM.