cvignac / DiGress

code for the paper "DiGress: Discrete Denoising diffusion for graph generation"
MIT License
366 stars 76 forks source link

Conditional Generation based on Subgraph #15

Closed twidatalla closed 1 year ago

twidatalla commented 1 year ago

Hello,

I work in drug discovery and am very interested in the application of this model in the generation of drug-molecules which contain a predefined motif, as you demonstrate in Appendix E. Would you be able to share a code example of the node and edge feature masking for motif preserving generation?

Wouldn't this require retraining the model on molecules with the given motif such that the noise model preserves the motif during diffusion? From this I could see how by masking/disallowing transitions for edges and nodes in the motif during denoising and letting everything else denoise regularly would result in structures which extend the motif.... or maybe I'm thinking about this wrong and no retraining is needed?

Best, Talal

cvignac commented 1 year ago

Hello Talal,

The code is based on an old version of the files, but in sample_zs_given_zt we add something like this:

        if self.cfg.scaffold_extension.use:
            # scaffold extension mask operation
            graph_scaffold = self.graph_from_scaffold(scaffold_smile='C1C=CNC2=CC=CC=C21')
            dense_data_scaffold, node_mask_scaffold = utils.to_dense(graph_scaffold.x, graph_scaffold.edge_index,
                                                                     graph_scaffold.edge_attr, graph_scaffold.batch)
            X_scaffold, E_scaffold = dense_data_scaffold.X, dense_data_scaffold.E
            n_nodes_scaffold = X_scaffold.shape[1]

            sampled_s.X[:, :n_nodes_scaffold] = X_scaffold.argmax(-1)
            sampled_s.E[:, :n_nodes_scaffold, :n_nodes_scaffold] = E_scaffold.argmax(-1)

It would probably work better if we preserved the motif during diffusion in training, as was done in https://arxiv.org/abs/2210.05274 for 3D point clouds. As you can see in the figures, the results of our method are not great. We wanted to showcase that substructure conditioning is possible, but we didn't spend much time on it.

Another option that does not involve retraining is to adapt the proposition of RePaint to graphs: https://arxiv.org/abs/2201.09865 and http://arxiv.org/abs/2302.01217

Best, Clement

twidatalla commented 1 year ago

Hi Clement,

Thank you for the response, I didn't realize the substructure conditioning wasn't one of you focuses, so thank you for referencing the other projects.

Looking at the script I can understand how your approach works so thank you for that as well. I can see why other approaches may be better. I recommend doing more work on this task if it suits your interests in the future however because it's quite relevant for drug design and a needed tool, a lot of the time there is an idea of the interactions/motifs desired with a target and we want to generate compounds from that, or we have a compound already and want to generate different components.

Excellent Paper! Talal

xinyangATK commented 1 year ago

Hello Talal,

The code is based on an old version of the files, but in sample_zs_given_zt we add something like this:

        if self.cfg.scaffold_extension.use:
            # scaffold extension mask operation
            graph_scaffold = self.graph_from_scaffold(scaffold_smile='C1C=CNC2=CC=CC=C21')
            dense_data_scaffold, node_mask_scaffold = utils.to_dense(graph_scaffold.x, graph_scaffold.edge_index,
                                                                     graph_scaffold.edge_attr, graph_scaffold.batch)
            X_scaffold, E_scaffold = dense_data_scaffold.X, dense_data_scaffold.E
            n_nodes_scaffold = X_scaffold.shape[1]

            sampled_s.X[:, :n_nodes_scaffold] = X_scaffold.argmax(-1)
            sampled_s.E[:, :n_nodes_scaffold, :n_nodes_scaffold] = E_scaffold.argmax(-1)

It would probably work better if we preserved the motif during diffusion in training, as was done in https://arxiv.org/abs/2210.05274 for 3D point clouds. As you can see in the figures, the results of our method are not great. We wanted to showcase that substructure conditioning is possible, but we didn't spend much time on it.

Another option that does not involve retraining is to adapt the proposition of RePaint to graphs: https://arxiv.org/abs/2201.09865 and http://arxiv.org/abs/2302.01217

Best, Clement

Hi Clement,

Recently I am working in drug discovery, especially small molecule generation. I found the 'substructure conditioned generation' in Appendix E. Thank you for giving such script to show this function, but it still has a little difficulty in reproducing this function with DiGress, especially self.graph_from_scaffold. Could you share detailed instruction or code, that will really help me.

Thanks! Xinyang