commons-research / MINE-Database

Metabolic In silico Network Expansion (MINE) Database Construction and DB Logic
MIT License
0 stars 0 forks source link

Check why some entries are flattened #1

Open oolonek opened 1 month ago

oolonek commented 1 month ago

For example :

HNBKFXBALXUFRY-YOGCQVOKSA-N,Cc1ccc2n1[C@@H]1[C@@H]3O[C@]([C@H](C)O)([C@H](O)C2)[C@H]1c1ccc(C)n1[C@@H]3C

and

HNBKFXBALXUFRY-NCVFPUPRSA-N,Cc1ccc2n1[C@@H]1[C@@H]3O[C@]([C@H](C)O)([C@@H](O)C2)[C@H]1c1ccc(C)n1[C@@H]3C

Result, after pickaxe process in :

{
  "_id": "Cc6d77db1bdda75b9feca389ea4939835a09c1c0f",
  "ID": "HNBKFXBALXUFRY-NCVFPUPRSA-N",
  "SMILES": "Cc1ccc2n1C(C)C1OC3(C(C)O)C(O)Cc4ccc(C)n4C1C23",
  "InChI_key": "HNBKFXBALXUFRY-UHFFFAOYSA-N",
  "Type": "Starting Compound",
  "Generation": 0,
  "Expand": true,
  "Reactant_in": [
    "Cc6d77db1bdda75b9feca389ea4939835a09c1c0f_0"
  ],
  "Product_of": []
}

Since we add metadata according to InChI_key and not ID these type of entries to not get populated

pamrein commented 1 month ago

I see the problem.

@ = counter-clockwise @@ = clockwise

The function "load_compound_set" in pickaxe_commons.py makes it in the step smi = utils.postsanitize_smiles([smi])[0][0] the removing of stereochemistry (@ sign).

I have to go deeper to understand the real problem. for sure RDKit is involved, which also has the function/module from rdkit.Chem import RemoveStereochemistry.

oolonek commented 1 month ago

Should be this. However if I remember correctly we commented this line see https://github.com/commons-research/MINE-Database/blob/005678689d531931823166d10fb8ca67e4fcfa7b/mine_database/pickaxe_commons.py#L307

Could you add the previous compounds to the lotus_10 test set and try to understand what's the issue ?