THUNLP-MT / PS-VAE

This repo contains the codes for our paper: Molecule Generation by Principal Subgraph Mining and Assembling.
https://arxiv.org/abs/2106.15098
MIT License
32 stars 8 forks source link

Can PS deal with non-connected graph? #13

Open Lyu6PosHao opened 2 months ago

Lyu6PosHao commented 2 months ago

Great work! But I have some questions:

In ChEBI, some molecules are non-connected, such as O=C1O[C@H]([C@H](O)CO)C([O-])=C1O.[Na+] where Na+ is an isolated ion.

So I wonder if PS is able to deal with non-connected graph? I tried it and found an error when tokenizing. Thanks.

kxz18 commented 2 months ago

Hi, thanks for your interest in our work! I have updated the repo with the ability to process non-connected molecules on inference (commit 507c8f9 and 8c4b7b4). The basic logic is to split the non-connected smiles with '.' and treat each connected subgraph separately. However, I noticed some issues which should be taken care of:

  1. The codes for constructing vocabulary are not changed, so it is recommended to manually split the non-connected molecules and treat each subgraph as an independent molecule.
  2. Additional isolated ions needs to be manually added to the atomic vocabulary and to the element-SMILES conversion logic (see commit 507c8f9)
  3. If the non-connected molecule contains ions, the charge on the organic part might be lost after reconstruction. For example, O=C1O[C@H]([C@H](O)CO)C([O-])=C1O.[Na+] might become O=C1O[C@H]([C@H](O)CO)CO)=C1O.[Na+] after reconstruction from the molecule object to the SMILES.
Lyu6PosHao commented 2 months ago

Thanks a lot!