UnixJunkie / molenc

MolEnc: a molecular encoder using rdkit and OCaml.
BSD 3-Clause "New" or "Revised" License
18 stars 2 forks source link

I need a dumb encoder: mol -> (MolW,cLogP,TPSA,RotB,HBA,HBD,FC) #80

Closed UnixJunkie closed 4 years ago

UnixJunkie commented 4 years ago

I have one somewhere else. Mini dragon or I don't know what funny name.

UnixJunkie commented 4 years ago

copy the script from here https://github.com/UnixJunkie/chemo_lizard

UnixJunkie commented 4 years ago

The TranNguyen-Rognan filter:

# Step 1: Organic compound filter. Molecules bearing at
least one atom other than H, C, N, O, P, S, F, Cl, Br, and I
were removed.
# Step 3: Molecular property range filter. Remaining
actives and inactives were kept if
• 150 < Molecular weight < 800 Da
• −3.0 < AlogP < 5.0
• Number of rotatable bonds < 15
• H-bond acceptor count < 10
• H-bond donor count < 10
• −2.0 < total formal charge < + 2.0

From

Tran-Nguyen, V. K., Jacquemard, C., & Rognan, D. (2020). LIT-PCBA: An Unbiased Data Set for Machine Learning and Virtual Screening. Journal of Chemical Information and Modeling.
UnixJunkie commented 4 years ago

So, with option: --remove-aliens :)

UnixJunkie commented 4 years ago

done: bin/molenc_lizard.py

UnixJunkie commented 4 years ago

should be OK; except the original AlogP range for which I asked a question (the paper uses AlogP computed with pipeline pilot; while I have cLogP computed by rdkit)