ci-lab-cz / easydock

BSD 3-Clause "New" or "Revised" License
35 stars 13 forks source link

dependance on chemaxon is annoying #17

Open UnixJunkie opened 8 months ago

UnixJunkie commented 8 months ago

maybe switch to Dimorphite-DL: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0336-9

I am in academia and I don't even have a chemaxon license anymore... Software vendors always change their license terms one day or another...

Feriolet commented 5 months ago

I have not tested CPU, but the result from GPU seems to almost match with the smi that you tested. I have to add the strict=False in the QueryModel for it to work (not sure why it does not occur on your side).

I also have to modify some of the script to make the subprocessing works:

def protonate_pkasolver(input_fname: str, output_fname: str, ncpu: int = 1):
    from pkasolver.query import QueryModel
    from torch import multiprocessing as mp
    mp.set_start_method('spawn', force=True)
    model = QueryModel()
    pool = mp.Pool(ncpu)
    with open(output_fname, 'wt') as f:
        ##pool.close() and .join() are used to erase the ( Producer process has been terminated before all shared CUDA tensors released.) warning
        pkasolver_output = pool.imap_unordered(partial(__protonate_pkasolver, model=model), read_input(input_fname))
        pool.close()
        pool.join()
        for smi, name in pkasolver_output:
            f.write(f'{smi}\t{name}\n')

Note that I use the mp.Pool(ncpu) argument to access the GPU memory, which is not a good idea since it can lead to out of memory error if we use a high number of ncpu. If we still want to use GPU for this, we probably need to add a ngpu argument for this function (and its parent function).

Here is the result of the smi test: image

I also tried to protonate 100 smiles used for my previous work (using --protonation dimorphite, and I notice that using --protonation pkasovler tends to deprotonate the smiles. I am not sure which one is correct.

input: COC(=O)c1nc2[nH]ccc2cc1-c1cccc(C(C)C)c1
dimo : COC(=O)c1nc2[nH]ccc2cc1-c1cccc(C(C)C)c1
pkas : COC(=O)c1nc2[n-]ccc2cc1-c1cccc(C(C)C)c1

input: CC(C)c1cccc(-c2ncc(F)c3[nH]ccc23)c1
dimo : CC(C)c1cccc(-c2ncc(F)c3[nH]ccc23)c1
pkas : CC(C)c1cccc(-c2ncc(F)c3[n-]ccc23)c1
Feriolet commented 5 months ago

For CPU, I can reproduce the same result as the GPU. However, the CPU gives this error:

  File "/home/user/miniforge3/envs/easydock/lib/python3.9/multiprocessing/pool.py", line 268, in __del__
  File "/home/user/miniforge3/envs/easydock/lib/python3.9/multiprocessing/queues.py", line 371, in put
AttributeError: 'NoneType' object has no attribute 'dumps'

I resolve this issue by using the pool.close() and pool.join() similar to the GPU ones (default multiprocessing instead from the torch version).

def protonate_pkasolver(input_fname: str, output_fname: str, ncpu: int = 1):
    from pkasolver.query import QueryModel
    model = QueryModel()
    pool = Pool(ncpu)
    with open(output_fname, 'wt') as f:
        pkasolver_output = pool.imap_unordered(partial(__protonate_pkasolver, model=model), read_input(input_fname))
        pool.close()
        pool.join()
        for smi, name in pkasolver_output:
            f.write(f'{smi}\t{name}\n')

I also need to use strict=False for this.

btw, some of the molecules print this while being protonated by the pkasolver. Is this something that we need to silent?

#########################
Could not identify any ionizable group. Aborting.
#########################
DrrDom commented 5 months ago

It seems that the calculation speed on CPU is sufficient. If so, we may force to use CPUs in pkasolver to avoid implementation issues with GPU. For example by adding force_cpu and call model = QueryModel(force_cpu=True)

class QueryModel:
    def __init__(self, force_cpu=False):

        self.models = []

        for i in range(25):
            model_name, model_class = "GINPair", GINPairV1
            model = model_class(
                num_node_features, num_edge_features, hidden_channels=96
            )
            base_path = path.dirname(__file__)
            if not force_cpu and torch.cuda.is_available() == False:  # If only CPU is available
                checkpoint = torch.load(
                    f"{base_path}/trained_model_without_epik/best_model_{i}.pt",
                    map_location=torch.device("cpu"),
                )
            else:
                checkpoint = torch.load(
                    f"{base_path}/trained_model_without_epik/best_model_{i}.pt"
                )

            model.load_state_dict(checkpoint["model_state_dict"])
            model.eval()
            model.to(device=DEVICE)
            self.models.append(model)

I do not know why you need strict=False for CPU-mode. For me, both options work. So, we may add it if it necessary. However, i do not fully understand what issues it may cause in future.

Introducing ngpu argument will complicate the interface, because we will need to pass it through all function calls. And it will require a separate implementation. If this does not bring substantial speed advantage I would suggest to avoid this.

The errors caused by multiprocessing and fixing them with pool.join() and pool.close() are unexpected. Never met such issues. Below I attached my env configuration. Maybe you use more recent python or module version where some changes were made.

easydock_env.txt

The messages from pkasolver should be suppressed. I did it with nostd context, but now it raises an error and I did not have time to fix it. You may uncomment that line and test it. Or you may suggest another solution. As far as I remember, my solution intercepted only particular stdrerr/stdout, not globally.

I was more surprised with your output of protonated smiles. Below is my output for the same structures and I generally agree with that. There were a lot of differences. Did you use the latest easydock version from noprints branch?

image

Feriolet commented 5 months ago

I finally can run it without any of those issue. I have reinstalled everything from scratch and the problem goes away. i guess there are conflicting torch version that I installed and messed up with the other packages.

For future reference:

conda create -n easydock -c conda-forge python=3.9 numpy=1.20 rdkit scipy dask distributed
conda activate easydock
pip install paramiko meeko vina
pip install git+https://github.com/Feriolet/dimorphite_dl.git
pip install git+https://github.com/DrrDom/pkasolver.git@noprints
pip install git+https://github.com/ci-lab-cz/easydock.git@pkasolver2
pip install torch==1.13.1+cpu  --extra-index-url https://download.pytorch.org/whl/cpu
pip install torch-geometric==2.0.1
pip install torch_scatter==2.1.1+pt113cpu -f https://data.pyg.org/whl/torch-1.13.1%2Bcpu.html
pip install torch_sparse==0.6.17+pt113cpu -f https://data.pyg.org/whl/torch-1.13.1%2Bcpu.html
pip install torch_spline_conv==1.2.2+pt113cpu -f https://data.pyg.org/whl/torch-1.13.1%2Bcpu.html
pip install molvs chembl_webresource_client matplotlib pytest-cov codecov svgutils cairosvg ipython

Yes, I agree that we should use CPU for convenience and consistency across other protonation software (dimorphite_dl and chemaxon). To silent the protonation output, I have added the contextlib package to the add_protonation() function inside the database.py file:

                if program == 'chemaxon':
                    protonate_func = partial(protonate_chemaxon, tautomerize=tautomerize)
                    read_func = read_protonate_chemaxon
                elif program == 'dimorphite':
                    protonate_func = partial(protonate_dimorphite, ncpu=ncpu)
                    read_func = read_smiles
                elif program == 'pkasolver':
                    protonate_func = partial(protonate_pkasolver, ncpu=ncpu)
                    read_func = read_smiles
                else:
                    protonate_func = empty_func
                    read_func = empty_generator

                with contextlib.redirect_stdout(None):
                    protonate_func(input_fname=tmp.name, output_fname=output)
DrrDom commented 5 months ago

Many thanks for the installation notes. We will include them to the README.

Is now the output identical to mine, no difference in protonation states? If so, I will merge everything to master branches and add this solution with contextlib.

Finally, I will keep dimorphite implementation inside the code, but will remove it from the command line interface, because currently it is not useful and will confuse users only.

Feriolet commented 5 months ago

Yes the protonated smiles are identical to the most recent one you showed

DrrDom commented 5 months ago

One more question. Does it work on computers with GPU? Should we add force_cpu option or not?

Feriolet commented 5 months ago

I have not tested the GPU from scratch. It should also work given that it has the same result as CPU for the previous protonation (as in yesterday's result). I'll update you once I can test it on the GPU.

Assuming that the users will follow the torch installation, there may be no need to yse force_cpu=True. I guess it can be a good option if you want to make sure that people who accidentally installed torch-cuda got the warning to only use CPU.

DrrDom commented 5 months ago

I updated easydock/pkasolver2 and pkasolver/main.

The minor issue which is remained - enumeration of stereoisomers after protonation, e.g. a new unspecified chiral center will appear in C[C@@H]1CCCN(C)C1 after protonation. I'm thinking how to do that with minimal code perturbation and maximum flexibility for the future changes. It may worth to redesign init_db and pull the function get_isomers out of it and apply it only after protonated molecules were generated.

if not os.path.isfile(args.output):
    create_db(args.output, args)
    init_db(args.output, args.input, args.prefix)
else:
    args_dict, tmpfiles = restore_setup_from_db(args.output)
    # this will ignore stored values of those args which were supplied via command line
    # command line args have precedence over stored ones
    for arg in supplied_args:
        del args_dict[arg]
    args.__dict__.update(args_dict)

dask_client = create_dask_client(args.hostfile)

if args.protonation:
    add_protonation(args.output, program=args.protonation, tautomerize=not args.no_tautomerization, ncpu=args.ncpu)

populate_stereoisomers(args.output, args.max_stereoisomers)

However, this will create an issue, that we will have records with identical smi, different stereo_id and different protonated_smi, that is very misleading and may result in many issues in future. a solution may be to introduce an additional filed to DB protonated_id and enable that a single molecule (smiles) may have several protonation states (which were not alternative protonation states in sense of dimorphite, but different stereoisomers appearing after protonation). I'm not confident with this solution, because it will complicate logic of functions and data manipulation. However, i do not see a better alternative.

Currently I tend to ignore this issue and postpone its solution for future.

Samuel-gwb commented 5 months ago

Some error use newst environment and run_dock ... --protonation pkasolver ... : ############################ File ".../miniconda3/envs/easydock_pka/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 366, in reduce_storage fd, size = storage._share_fdcpu()

RuntimeError: unable to open shared memory object in read-write mode: Too many open files (24) #############################

But it's successful to use: run_dock ... --protonation dimorphite ...

I finally can run it without any of those issue. I have reinstalled everything from scratch and the problem goes away. i guess there are conflicting torch version that I installed and messed up with the other packages.

For future reference:

conda create -n easydock -c conda-forge python=3.9 numpy=1.20 rdkit scipy dask distributed
conda activate easydock
pip install paramiko meeko vina
pip install git+https://github.com/Feriolet/dimorphite_dl.git
pip install git+https://github.com/DrrDom/pkasolver.git@noprints
pip install git+https://github.com/ci-lab-cz/easydock.git@pkasolver2
pip install torch==1.13.1+cpu  --extra-index-url https://download.pytorch.org/whl/cpu
pip install torch-geometric==2.0.1
pip install torch_scatter==2.1.1+pt113cpu -f https://data.pyg.org/whl/torch-1.13.1%2Bcpu.html
pip install torch_sparse==0.6.17+pt113cpu -f https://data.pyg.org/whl/torch-1.13.1%2Bcpu.html
pip install torch_spline_conv==1.2.2+pt113cpu -f https://data.pyg.org/whl/torch-1.13.1%2Bcpu.html

Yes, I agree that we should use CPU for convenience and consistency across other protonation software (dimorphite_dl and chemaxon). To silent the protonation output, I have added the contextlib package to the add_protonation() function inside the database.py file:

                if program == 'chemaxon':
                    protonate_func = partial(protonate_chemaxon, tautomerize=tautomerize)
                    read_func = read_protonate_chemaxon
                elif program == 'dimorphite':
                    protonate_func = partial(protonate_dimorphite, ncpu=ncpu)
                    read_func = read_smiles
                elif program == 'pkasolver':
                    protonate_func = partial(protonate_pkasolver, ncpu=ncpu)
                    read_func = read_smiles
                else:
                    protonate_func = empty_func
                    read_func = empty_generator

                with contextlib.redirect_stdout(None):
                    protonate_func(input_fname=tmp.name, output_fname=output)
Feriolet commented 5 months ago

Have you tried reinstalling the conda environment from scratch? The error seems to be caused by torch.multiprocessing, but I am not sure if the default multiprocessing can call the torch multiprocessing.

Also, how many CPU did you use?

Samuel-gwb commented 5 months ago

Yes, I freshly installed a new conda environment, named as easydock_new. My computer has 64 cpu. Need to specify cpu, something like cpu:0?

Feriolet commented 5 months ago

I was referring to the -c argument that you use to run the code.

I tried to reinstall it from scratch again and I still can't replicate your error. Maybe you can give the full error log on your side and your environment.txt? Im not sure how to approach this error.

From what I found on the internet, the error is either caused by the linux limit on how many files you can write or read (unlikely because your --protonation dimorphite works and I assume both access similar bytes of files) or it may be because of the pkasolver torch or QueryModel(). Maybe you can give us the snippet for the QueryModel() class also?

Feriolet commented 5 months ago

@DrrDom btw for your previous qn on GPU (if you are still interested):

Both GPU and CPU gives identical protonated smiles. For 100 smiles: -c 30 protonates for 57.70 s 1 gpu protonates for 46.74s 2 gpu pool (shared gpu) protonates for 27.77s 4 gpu pool (shared gpu) protonates for 18.94s

Samuel-gwb commented 5 months ago

Command is as: run_dock -i "$smi_file" -o "$output_file" --program vina --config config_vina.yml --protonati on pkasolver -c 1 --sdf

"--protonation dimorphite " is using the same input file.

For QueryModel(), I think it should be created by "pip install ..." into the miniconda3/env/easydock_pka/lib/python3.9/site-packages/pkasolver/query.py. I have not modified it, which is as :

class QueryModel: def init(self):

    self.models = []

    for i in range(25):
        model_name, model_class = "GINPair", GINPairV1
        model = model_class(
            num_node_features, num_edge_features, hidden_channels=96
        )
        base_path = path.dirname(__file__)
        if torch.cuda.is_available() == False:  # If only CPU is available
            checkpoint = torch.load(
                f"{base_path}/trained_model_without_epik/best_model_{i}.pt",
                map_location=torch.device("cpu"),
            )
        else:
            checkpoint = torch.load(
                f"{base_path}/trained_model_without_epik/best_model_{i}.pt"
            )

        model.load_state_dict(checkpoint["model_state_dict"])
        model.eval()
        model.to(device=DEVICE)
        self.models.append(model)

def predict_pka_value(self, loader: DataLoader) -> np.ndarray:
    """
    ----------
    loader
        data to be predicted
    Returns
    -------
    np.array
        list of predicted pKa values
    """

    results = []
    assert len(loader) == 1
    for data in loader:  # Iterate in batches over the training dataset.
        data.to(device=DEVICE)
        consensus_r = []
        for model in self.models:
            y_pred = (
                model(
                    x_p=data.x_p,
                    x_d=data.x_d,
                    edge_attr_p=data.edge_attr_p,
                    edge_attr_d=data.edge_attr_d,
                    data=data,
                )
                .reshape(-1)
                .detach()
            )

            consensus_r.append(y_pred.tolist())
        results.extend(
            (
                float(np.average(consensus_r, axis=0)),
                float(np.std(consensus_r, axis=0)),
            )
        )
    return results

environment is as: easydock_pka.txt

Feriolet commented 5 months ago

I am assuming easydock_pka is the same as easydock_new environment?

Feriolet commented 5 months ago

I have tried installing easydock_pka (torch dependencies, easydock, dimorphite, and pkasolver are installed separately with pip because conda probably won't recognise it) and it still works from my side.

I am now a bit lost. What about sending me the easydock protonation.py file then? It should be the most udpated one right?

Also, it would be helpful if you can show the error before this one too

############################
File ".../miniconda3/envs/easydock_pka/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 366, in reduce_storage
fd, size = storage.share_fd_cpu()

RuntimeError: unable to open shared memory object </torch_4154596_2592645159_498> in read-write mode: Too many open files (24)
#############################
Samuel-gwb commented 5 months ago

input files also attached : test.zip

I re-run the command, error a little different with both "Too many open files" Error mesage:

(easydock_pka) gwb@node01: Small_Molecule/Y73C_GTP$ ./Ensemble_RunDock.sh Traceback (most recent call last): File "/home/gwb/miniconda3/envs/easydock_pka/bin/run_dock", line 8, in sys.exit(main()) File "/home/gwb/miniconda3/envs/easydock_pka/lib/python3.9/site-packages/easydock/run_dock.py", line 207, in main add_protonation(args.output, program=args.protonation, tautomerize=not args.no_tautomerization, ncpu=args.ncpu) File "/home/gwb/miniconda3/envs/easydock_pka/lib/python3.9/site-packages/easydock/database.py", line 348, in add_protonation protonate_func(input_fname=tmp.name, output_fname=output) File "/home/gwb/miniconda3/envs/easydock_pka/lib/python3.9/site-packages/easydock/protonation.py", line 92, in protonate_pkasolver for smi, name in pool.imap_unordered(partial(protonate_pkasolver, model=model), read_input(input_fname)): File "/home/gwb/miniconda3/envs/easydock_pka/lib/python3.9/multiprocessing/pool.py", line 870, in next raise value File "/home/gwb/miniconda3/envs/easydock_pka/lib/python3.9/multiprocessing/pool.py", line 537, in _handle_tasks put(task) File "/home/gwb/miniconda3/envs/easydock_pka/lib/python3.9/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/home/gwb/miniconda3/envs/easydock_pka/lib/python3.9/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "/home/gwb/miniconda3/envs/easydock_pka/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 367, in reduce_storage df = multiprocessing.reduction.DupFd(fd) File "/home/gwb/miniconda3/envs/easydock_pka/lib/python3.9/multiprocessing/reduction.py", line 198, in DupFd return resource_sharer.DupFd(fd) File "/home/gwb/miniconda3/envs/easydock_pka/lib/python3.9/multiprocessing/resource_sharer.py", line 48, in init__ new_fd = os.dup(fd) OSError: [Errno 24] Too many open files

Samuel-gwb commented 5 months ago

I am assuming easydock_pka is the same as easydock_new environment?

Yes, they are the same. typo mistakes

Feriolet commented 5 months ago

Yes, it still runs without issue.

Ok, what about changing the protonation function. Maybe it works for your case?

def protonate_pkasolver(input_fname: str, output_fname: str, ncpu: int = 1):
    from pkasolver.query import QueryModel
    model = QueryModel()
    with contextlib.redirect_stdout(None):
        pool = Pool(ncpu)
        with open(output_fname, 'wt') as f:
            pkasolver_output = pool.imap_unordered(partial(__protonate_pkasolver, model=model), read_input(input_fname))
            pool.close()
            pool.join()
            for smi, name in pkasolver_output:
                f.write(f'{smi}\t{name}\n')
DrrDom commented 5 months ago

@Samuel-gwb, I'm a little bit lost. You posted two error messages with "too many open files". One is related to torch, another to standard multiprocessing. Do you use the latest version easydock/pkasolver2 branch? Do you have GPU?

Since you use -c 1 I cannot imagine how you may exceed the number of opened files.

You may increase the number of file descriptors opened simultaneously ulimit -n 4096, but this looks like not a proper solution.

Feriolet commented 5 months ago

Yes that is what I thought as well.

From the error, it looks like the default multiprocessing calls to torch version, which calls the default version again. It is very interesting.

I tried not to use the ulimit solution as it is surprising that accessing one cpu would cause this issue and may be used as a last resort if everything else fails

DrrDom commented 5 months ago

If multiprocessing.pool calls multiprocessing.pool directly it should result in an error about nested processes or the like, because this is forbidden by design. If this call happens through torch, maybe this avoids this error, but causes another one.

In that case I see two possible solutions:

  1. Add force_cpu argument to QueryModel and set it to True.
  2. Detect GPUs within protonate_pkasolver function and call protonation without multiprocessing.pool

@Samuel-gwb, could you test the function below?


def protonate_pkasolver(input_fname: str, output_fname: str, ncpu: int = 1):
    import torch
    from pkasolver.query import QueryModel
    model = QueryModel()
    with contextlib.redirect_stdout(None):
        if torch.cuda.is_available() or ncpu == 1:
            with open(output_fname, 'wt') as f:
                for mol, mol_name in read_input(input_fname):
                    smi, name = _protonate_pkasolver(mol, mol_name, model=model)
                    f.write(f'{smi}\t{name}\n')```            
        else:
            pool = Pool(ncpu)
            with open(output_fname, 'wt') as f:
                for smi, name in pool.imap_unordered(partial(__protonate_pkasolver, model=model), read_input(input_fname)):
                    f.write(f'{smi}\t{name}\n')```
Samuel-gwb commented 5 months ago

Yes, I use the same easydock_pka environment for different tests. And the last error message can be repeated for last several times. Will try your solutions with modified pka_solver function!

Samuel-gwb commented 5 months ago

Very confused ! Again, freshly installed an environment, just change easydock --> easydock_test1 : $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ conda create -n easydock_test1 -c conda-forge python=3.9 numpy=1.20 rdkit scipy dask distributed conda activate easydock_test1 pip install paramiko meeko vina pip install git+https://github.com/Feriolet/dimorphite_dl.git pip install git+https://github.com/DrrDom/pkasolver.git@noprints pip install git+https://github.com/ci-lab-cz/easydock.git@pkasolver2 pip install torch==1.13.1+cpu --extra-index-url https://download.pytorch.org/whl/cpu pip install torch-geometric==2.0.1 pip install torch_scatter==2.1.1+pt113cpu -f https://data.pyg.org/whl/torch-1.13.1%2Bcpu.html pip install torch_sparse==0.6.17+pt113cpu -f https://data.pyg.org/whl/torch-1.13.1%2Bcpu.html pip install torch_spline_conv==1.2.2+pt113cpu -f https://data.pyg.org/whl/torch-1.13.1%2Bcpu.html pip install molvs chembl_webresource_client matplotlib pytest-cov codecov svgutils cairosvg ipython $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

Then,Use default protonate_pkasolver function, error: $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ Traceback (most recent call last): File "/home/gwb/miniconda3/envs/easydock_test1/bin/run_dock", line 8, in sys.exit(main()) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/site-packages/easydock/run_dock.py", line 207, in main add_protonation(args.output, program=args.protonation, tautomerize=not args.no_tautomerization, ncpu=args.ncpu) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/site-packages/easydock/database.py", line 348, in add_protonation protonate_func(input_fname=tmp.name, output_fname=output) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/site-packages/easydock/protonation.py", line 92, in protonate_pkasolver for smi, name in pool.imap_unordered(partial(protonate_pkasolver, model=model), read_input(input_fname)): File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/multiprocessing/pool.py", line 870, in next raise value File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/multiprocessing/pool.py", line 537, in _handle_tasks put(task) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 367, in reduce_storage df = multiprocessing.reduction.DupFd(fd) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/multiprocessing/reduction.py", line 198, in DupFd return resource_sharer.DupFd(fd) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/multiprocessing/resource_sharer.py", line 48, in init__ new_fd = os.dup(fd) OSError: [Errno 24] Too many open files $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

Use Feriolet‘s version to modify "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/site-packages/easydock/protonation.py", replace contents of 'def protonate_pkasolver', then: $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ Traceback (most recent call last): File "/home/gwb/miniconda3/envs/easydock_test1/bin/run_dock", line 8, in sys.exit(main()) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/site-packages/easydock/run_dock.py", line 207, in main add_protonation(args.output, program=args.protonation, tautomerize=not args.no_tautomerization, ncpu=args.ncpu) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/site-packages/easydock/database.py", line 348, in add_protonation protonate_func(input_fname=tmp.name, output_fname=output) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/site-packages/easydock/protonation.py", line 106, in protonate_pkasolver for smi, name in pkasolver_output: File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/multiprocessing/pool.py", line 870, in next raise value File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/multiprocessing/pool.py", line 537, in _handle_tasks put(task) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 366, in reduce_storage fd, size = storage._share_fdcpu() RuntimeError: unable to open shared memory object in read-write mode: Too many open files (24) $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

Use Pavel's version, need to modify 'smi, name = protonate...' --> 'smi, name = protonate_...' : $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ Traceback (most recent call last): File "/home/gwb/miniconda3/envs/easydock_test1/bin/run_dock", line 8, in sys.exit(main()) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/site-packages/easydock/run_dock.py", line 207, in main add_protonation(args.output, program=args.protonation, tautomerize=not args.no_tautomerization, ncpu=args.ncpu) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/site-packages/easydock/database.py", line 348, in add_protonation protonate_func(input_fname=tmp.name, output_fname=output) File "/home/gwb/miniconda3/envs/easydock_test1/lib/python3.9/site-packages/easydock/protonation.py", line 119, in protonate_pkasolver smi, name = protonate_pkasolver(mol, mol_name, model=model) TypeError: __protonate_pkasolver() got multiple values for argument 'model' $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

Feriolet commented 5 months ago

Please put a bracket for the mol and mol_name. This will allow the __protonate_pkasolver to treat mol and mol_name as one variable instead of two. By not doing that, the function treats mol_name as the model, which is why the code is confused

smi, name = __protonate_pkasolver((mol, mol_name), model=model)
Samuel-gwb commented 5 months ago

Great, it works! Many thanks !

1) One more thing, -c have to be 1. run_dock -i GTP.smi -o GTP_vina.db --program vina --config config_vina.yml --protonation pkasolver -c 1 --sdf

Any -c > 1 will cause "Too many open files" error.

2) Another thing is that, additional nitrogen adjacent to the imidazole ring of GTP was protonated as NH-. I know that it will be NH2 when someone use schrodinger.

GTP smi: OC1C(COP(=O)(OP(=O)(OP(=O)(O)O)O)O)OC(C1O)n1cnc2c1[nH]c(N)nc2=O protonated by pkasolver --> [NH-]c1nc(=O)c2ncn([C@H]3OC@@HC@H[C@H]3[O-])c2[n-]1

GTP_vina.sdf.txt

Feriolet commented 5 months ago
  1. Yes, we have made it that multiprocessing is used when -c > 1, which is the package that is giving you the error.

We hope that using 1 cpu is sufficient for your use case. I honestly still can't reproduce your error, so I can't really help you too much with that. I also tried running it on Apple M1, and there is also no such issue. We can still try to tackle the multiprocessing issue if you wish to access more than 1 cpu, but it may be challenging as some of the obvious solutions do not work.

  1. Regarding this, easydock only give 1 protonated smiles. I am sure GTP has multiple protonation center, and it just happens the pkasolver give the [NH-] out of the many possible protonated smiles. We are unsure if we should give multiple protonated smiles per smile input, as we are still considering the stereoisomer enumeration issue after protonation.

@DrrDom correct me if there is any mistake I said

DrrDom commented 5 months ago

The error for ncpu > 1 is strange. This means that you do not have detectable GPU and use exclusively multiprocessing. I never met such en error for multiprocessing.

Wrong protonations may occur. Every protonation tool is incorrect to some extent. pkasolver model publicly available was trained on single-center molecules. Therefore, prediction for complex molecules with multiple protonation centers may be incorrect. That is why applicability of different protonation tools should be studied more thoroughly. Meanwhile we may use pkasolver as alternative to chemaxon.

DrrDom commented 5 months ago

I updated master with the most recent changes. I'll keep the issue open, because I believe we will return to it in future. Thanks a lot to everybody who helped with that!

Samuel-gwb commented 5 months ago

Great ! Some tiny things: 1) in readme, the line for pip installaltion of torch_spline_conv contains additional ''' at the end.

2) It seems that "pip install cairosvg svgutils" is needed. And, at last, installation may need include "pip install ." at $easydock_home.

3) Thus, when using cpu-based pkasolver for protonation, one need set "-c 1" ? If so, include it in readme?

DrrDom commented 5 months ago

Thank you!

  1. Fixed.
  2. Installation of easydock was described. Your suggestion is not relevant for ordinary users. This is mainly for developers, who has a clone of the repository. I'll update the pypi package soon. I expect to close another PR before officially update the version.
  3. Currently it is not necessary. Your case seems very specific. We will collect other user responses whether they have issues with that. However, it may worth to mention this issue in README to attract user attention.
Feriolet commented 5 months ago

I agreed with @Samuel-gwb for his 2nd point. Don't we need the pip install cairosvg svgutils to run the pkasolver? At least for my side it gave the import error for the cairosvg.

Edit: nvm I think I got ur point, my bad

DrrDom commented 5 months ago

You were right) Thanks to pointing me out. I indeed missed to add these packages cairosvg svgutils to the list of required ones. I'll do that.