UnixJunkie / FASMIFRA

Molecular Generation by Fast Assembly of SMILES Fragments
GNU General Public License v3.0
50 stars 8 forks source link

fasmifra executable still appears to produce fragments #16

Closed matthewcarbone closed 1 year ago

matthewcarbone commented 1 year ago

Referencing #14 for completeness.

@UnixJunkie I have tried yet another approach to get this working. I'm happy to chat over Zoom but I really believe something is broken with the installation process at this point.

Steps I have taken

I have spun up a fresh Docker environment on my computer.

docker run -ti --rm -v ~/Data/Docker_Share:/data myubuntu /bin/bash

with a very simple dockerfile

FROM ubuntu:22.04

# Disable Prompt During Packages Installation
ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y \
    vim \
    && rm -rf /var/lib/apt/lists/*

On this fresh environment, I have performed the following steps:

apt update
apt -y install dune
apt -y install git
git clone https://github.com/UnixJunkie/FASMIFRA.git
cd FASMIFRA
apt -y install opam
opam init
eval `opam config env`
apt install pip
pip install rdkit
opam install --fake conf-rdkit
opam install fasmifra

and then executed the steps as laid out in the test.sh script:

xzcat data/CHEMBL_100k.smi.xz | head -1000 > chembl_1k.smi
./bin/fasmifra_fragment.py -i chembl_1k.smi -o chembl_1k_frags.smi
fasmifra -f -n 1000 -i chembl_1k_frags.smi -o gen_1k.smi

where note that there is nothing installed to _build during the make process, so I am using what I believe to be the correct executable in the working directory. This leads to similar fragments as we discussed in #14.

Is it possible that there's some difference between your most up-to-date code here and the executable you have on your computer?

UnixJunkie commented 1 year ago

Remove this line

apt -y install dune

then retry.

UnixJunkie commented 1 year ago

dune is supposed to be installed by opam automatically as a dependency of fasmifra.

UnixJunkie commented 1 year ago

you should use pip3 instead of pip to install rdkit, to be sure python3 things are being used. Then, fire up a python3 interpreter and check that rdkit is installed properly (import rdkit).

matthewcarbone commented 1 year ago

Ok sounds good, let me give this a try. You're right I did not use Dune explicitly during the installation.

matthewcarbone commented 1 year ago

@UnixJunkie I have retried again using your instructions, but unfortunately I am running into the same issue. Still lots of fragments such as these

# gen_1k.smi
...
C(Nc0ccc(-n1nc(C2CC2)cc1[*:1][*:11]C1CC1)cc0)(=O)c0ccncc0       genmol_4
...

in the output file.

It might be prudent at this stage for you to check what I've done here. I'm betting others have run into this/similar issues. Could you try to reproduce the steps here on your own machine and see what you get?

Precise steps I've taken

First, spin up container:

docker run -ti --rm -v ~/Data/Docker_Share:/data myubuntu /bin/bash

(dockerfile)

FROM ubuntu:22.04

# Disable Prompt During Packages Installation
ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y \
    vim \
    && rm -rf /var/lib/apt/lists/*

The proceed with the installation

apt update
# apt -y install dune # not doing this!
apt -y install git
git clone https://github.com/UnixJunkie/FASMIFRA.git
cd FASMIFRA
apt -y install opam  # Used all defaults
opam init  # disabled sandboxing, ok since in container
eval `opam config env`
apt -y install python3-pip  # <- using pip3
pip3 install rdkit  # <- using pip3
opam install --fake conf-rdkit
opam install fasmifra

Note that rdkit works fine:

>>> from rdkit import Chem
>>> Chem.MolFromSmiles("CCC")
<rdkit.Chem.rdchem.Mol object at 0xffffb1efb3e0>

And then executed the same script as before.

xzcat data/CHEMBL_100k.smi.xz | head -1000 > chembl_1k.smi
./bin/fasmifra_fragment.py -i chembl_1k.smi -o chembl_1k_frags.smi
fasmifra -f -n 1000 -i chembl_1k_frags.smi -o gen_1k.smi
UnixJunkie commented 1 year ago

there is now a install.sh script; I also updated the README. Regards, F.

matthewcarbone commented 1 year ago

@UnixJunkie following the instructions in the new install.sh script has worked!

image

I'm not really sure what is different between these instructions and what I did before... maybe the order somehow? I'm not precisely sure. Anyways, just for the record so you know exactly what I did (and so I know exactly what to do) 😁:

Dockerfile:

# myubuntu

FROM ubuntu:22.04

# Disable Prompt During Packages Installation
ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y \
    vim \
    && rm -rf /var/lib/apt/lists/*

Initial commands:

docker run -ti myubuntu /bin/bash
apt update
apt -y install git
git clone https://github.com/UnixJunkie/FASMIFRA.git
cd FASMIFRA

Next set of commands to install everything. Note I had to install python3-pip and run without sudo since I'm in a container:

apt install -y opam
opam init -y
apt install python3-pip  # Needed to do this first
pip3 install rdkit
eval `opam config env`
opam install --fake conf-rdkit
opam install -y fasmifra

which fasmifra_fragment.py
# /root/.opam/default/bin/fasmifra_fragment.py

which fasmifra
# /root/.opam/default/bin/fasmifra

Running with explicit paths just to be totally sure...

xzcat data/CHEMBL_100k.smi.xz | head -1000 > chembl_1k.smi
/root/.opam/default/bin/fasmifra_fragment.py -i chembl_1k.smi -o chembl_1k_frags.smi
/root/.opam/default/bin/fasmifra -f -n 1000 -i chembl_1k_frags.smi -o gen_1k.smi

And success!

# gen_1k.smi
c0(-c1ccc(Cl)cc1)sc1n(c(COC(=O)C)nn1)n0 genmol_1
COc0c(F)cc(N/N=C(/Cn1c2c(nn1)cccc2)c1ccccc1)cc0F        genmol_2
c01c(cc(C(=O)C2CCCN(Cc3cc(F)c(F)cc3)C2)cc0)OCO1 genmol_3
c0(C1(O)OC(=O)C(c2ccc(NC(=O)NCCCC)cc2)=C1Cc1ccccc1)ccc(OC)cc0   genmol_4
C(OC0C(OC(=O)C)C(OC(=O)N2[C@@H]3c4c(c(OC)c(C)c(OCCCC(=O)O)c4OC)C[C@H]2C(=O)N2[C@@H](CN4C(=O)c5c(cccc5)C4=O)c4c(OC)c(OC)c(C)c(OC)c4C=C32)COC0n0c(=S)c(C#N)c(-c1ccccc1)cc0-c0ccc(Cl)cc0)(=O)C     genmol_5
C(COc1ccc(Nc2ncnc3cnc(-c4n(C)cnc4)cc32)cc1)N0CCC(NS(=O)(=O)c5c(I)cccc5)CC0      genmol_6
c01c(COC)c(C(O)=O)oc0cccc1      genmol_7
C(C(=O)Nc0ccccc0N0CCN(C(C(C)C)=O)CC0)Oc1ccc2c(c1)C(=O)C(=O)N2   genmol_8
Clc0c(C(=O)NC(Nc1ccc(Cl)cc1)=S)cccc0    genmol_9
C0C1CC2CC0C(OC(C)C)C(C2)C1      genmol_10
N(c0ccc(CC)cc0)c0ccc(F)cc0      genmol_11
Fc0ccc(C(CCCNCCc2ccccc2)c1ccc(F)cc1)cc0 genmol_12
Clc0c(C(=O)NC1CCC(F)(F)C1)cccc0 genmol_13
N0(C)CCc1n(CC)nc(C(=O)NCC(C(O)=O)N)c1C0 genmol_14
CCC(CC)(CN)NC(C)c0ccccc0        genmol_15
c0c(C(c1ccccc1)CN2CCN(c3ccccc3)CC2)cccc0        genmol_16
Clc0ccc(OP(=O)(Oc1ccc(Cl)cc1)[C@@H](C(C)C)NC(=O)[C@H](c1ccc(F)cc1)OC(=O)[C@H](CCSC)Nc2cc(Cl)cc(Cl)c2)cc0        genmol_17
c0c(N/N=C(\C(=O)C)C(c1ccccc1)=O)ccc(OC)c0       genmol_18
CC(=C0C[C@H]1[C@H](C(=C)C)CC[C@]1(C)OC0=O)C     genmol_19
Clc0ccc(O)c(/C=N/NS(=O)(=O)c1ccc(Br)cc1)c0      genmol_20

I'm quite happy with this and will almost certainly be using it in some future work. Thank you!

UnixJunkie commented 1 year ago

In the container, everything is installed as root ?!

Good that it worked for you.

matthewcarbone commented 1 year ago

@UnixJunkie I'm not super familiar with Docker just yet, but I believe so!