Networks-Learning / nevae

Code and data for "NeVAE: A Deep Generative Model for Molecular Graphs", AAAI 2019
54 stars 14 forks source link

How to get training data #1

Closed odek53r closed 6 years ago

odek53r commented 6 years ago

How can I get the training data? It seems there doesn't have introduction about data format or data examples for running the program. I'll appreciate for your help.

bidishasamantakgp commented 6 years ago

Hi,

In order to train the model with synthetic graphs, you can use https://github.com/Networks-Learning/nevae/blob/master/generate_erdos_renyi.py code and get a generated set of graphs. You can use python generate_erdos_renyi.py --help to know the parameters.

Also to generate molecular graphs for ZINC data set you can download the clean druglike molecules from http://zinc.docking.org/subsets/clean-drug-like Then you can use the .mol2 files and .sml files and checkout node_label branch. Then use the following command python molecular_graph_conversion.py .sml .mol2 Similarly, download the QM9 data from https://figshare.com/collections/Quantum_chemistry_structures_and_properties_of_134_kilo_molecules/978904 and get the smiles, convert them to mol2 file using rdkit and use the code to convert it to a molecular graph to be used by networkx module.

For example: The molecular graph corresponding to the following smile string: CC(C)(O)CC1CCNCC1 is 0 1 {"weight":1} 0 11 {"weight":1} 0 12 {"weight":1} 0 13 {"weight":1} 1 2 {"weight":1} 1 3 {"weight":1} 1 10 {"weight":1} 2 14 {"weight":1} 2 15 {"weight":1} 2 16 {"weight":1} 3 4 {"weight":1} 3 17 {"weight":1} 3 18 {"weight":1} 4 9 {"weight":1} 4 5 {"weight":1} 4 29 {"weight":1} 5 6 {"weight":1} 5 19 {"weight":1} 5 20 {"weight":1} 6 7 {"weight":1} 6 21 {"weight":1} 6 22 {"weight":1} 7 8 {"weight":1} 7 28 {"weight":1} 8 9 {"weight":1} 8 23 {"weight":1} 8 24 {"weight":1} 9 25 {"weight":1} 9 26 {"weight":1} 10 27 {"weight":1}

Hope this helps.

odek53r commented 6 years ago

Thank you for the prompt reply. I convert mol2 to graph successfully.