cvignac / DiGress

code for the paper "DiGress: Discrete Denoising diffusion for graph generation"
MIT License
314 stars 68 forks source link

DiGress: Discrete Denoising diffusion models for graph generation

Update (Nov 20th, 2023): Working with large graphs (more than 100-200 nodes)? Consider using SparseDiff, a sparse version of DiGress: https://github.com/qym7/SparseDiff

Update (July 11th, 2023): the code now supports multi-gpu. Please update all libraries according to the instructions. All datasets should now download automatically

Environment installation

This code was tested with PyTorch 2.0.1, cuda 11.8 and torch_geometrics 2.3.1

Note: graph_tool and torch_geometric currently seem to conflict on MacOS, I have not solved this issue yet.

Run the code

Checkpoints

The following checkpoints should work with the latest commit:

The following checkpoints require to revert to commit 682e59019dd33073b1f0f4d3aaba7de6a308602e and run pip install -e .:

Generated samples

We provide the generated samples for some of the models. If you have retrained a model from scratch for which the samples are not available yet, we would be very happy if you could send them to us!

Troubleshooting

PermissionError: [Errno 13] Permission denied: '/home/vignac/DiGress/src/analysis/orca/orca': You probably did not compile orca.

Use DiGress on a new dataset

To implement a new dataset, you will need to create a new file in the src/datasets folder. Depending on whether you are considering molecules or abstract graphs, you can base this file on moses_dataset.py or spectre_datasets.py, for example. This file should implement a Dataset class to process the data (check PyG documentation), as well as a DatasetInfos class that is used to define the noise model and some metrics.

For molecular datasets, you'll need to specify several things in the DatasetInfos:

The node counts and the distribution of node types and edge types can be computed automatically using functions from AbstractDataModule.

Once the dataset file is written, the code in main.py can be adapted to handle the new dataset, and a new file can be added in configs/dataset.

Cite the paper

@inproceedings{
vignac2023digress,
title={DiGress: Discrete Denoising diffusion for graph generation},
author={Clement Vignac and Igor Krawczuk and Antoine Siraudin and Bohan Wang and Volkan Cevher and Pascal Frossard},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=UaAD-Nu86WX}
}