coleygroup / pyscreener

pythonic interface to virtual screening software
MIT License
86 stars 32 forks source link

Problem with ray (This function was not imported properly.) #9

Closed rafalbachorz closed 3 years ago

rafalbachorz commented 3 years ago

Hello, After updating to the most recent version I got such an issue:

image

Can you give me a hand here? Is the user supposed to have a certain version of ray? Thank you in advance.

Kind regards, Rafał Bachorz

davidegraff commented 3 years ago

Hi Rafal,

I've had that issue pop up a few times and it's usually fixed by restarting your ray cluster. On a somewhat related note, pyscreener is soon to be updated to version 1 in the coming week or two. The API is a bit different but the core functionality is the same. If you'd like to start migrating, the new version currently resides on the refactor branch, and I haven't encountered this issue with the new code. Let me know if that helps!

rafalbachorz commented 3 years ago

Hello Dawid, Yes, after migrating to the refactor branch it started to work, thanks for that hint. Thank you for the effort related to the refactor, I can see severe changes. Is that the way to go with smina "programatically", i.e. not via CLI? image

In order to make it working I had to comment the the first line the pyscreener/init.py file, do you have a clue why?

image

Thanks for your support.

Kind regards, Rafał

davidegraff commented 3 years ago

what's the error if you don't uncomment out that line? And yes, the way you've written it is the way to perform Smina docking programatically. Just a small note, that you can pass in a template dictionary "smina" as the "software" key instead of using the Software enum. The Enum type is only there for internal consistency.

rafalbachorz commented 3 years ago

Thanks for prompt feedback. It says this: image

This is because the mentioned file _version.py has only this content: image

davidegraff commented 3 years ago

Hmm that's odd. I'm not seeing any such code _version.py. Can you try pulling again and reinstalling (if you didn't use the -e flag)

rafalbachorz commented 3 years ago

The file is not in the repo, it is magically created after pip install .. I have this file in the installation directory, in the conda environment:

image

davidegraff commented 3 years ago

Could you try adding the -e flag to your install? I was able to recreate your error when running pip install ., but adding -e seemed to fix it for me. It must be some issue with the packaging info that I'll need to fix. If you have any ideas, i'm all ears!

davidegraff commented 3 years ago

i just pushed a new change that seems to fix the pip install . issue as well. I'm still not entirely confident on why that worked though...

rafalbachorz commented 3 years ago

Hello, Thanks a lot. It helped, I think. I do not experience the problem. Just by opening the other discussion: actually quite often the 3D molecule preparation (as it is now, i.e. simple molecule preparation/UFF optimization) provides bad structures. It looks better with RDKit, e.g. like this:

mol = Chem.MolFromSmiles(smi)
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol, randomSeed=0xf00d)
mol.GetNumConformers()
AllChem.UFFOptimizeMolecule(mol, maxIters=3000)

Even though it is also the UFF approach, I have much better experiences with this compared to the implementation in openbabel/pybel. Quite recently I have found this: https://github.com/grimme-lab/xtb This is a tight-binding software that can provide the 3D geometries of ligands of much higher quality. There is also a python wrapper for that: https://github.com/ppqm/ppqm I think these two can do a good job for the docking. The optimization is of course a bit more expensive, for the moderate size molecule it takes a minute or two. Just let me know if the extension towards a different "molecular embeder" is anyhow interesting to you. I can contribute to the development.

davidegraff commented 3 years ago

this response pertains to the new feature branch

we currently support RDKit optimized geometries via the input_file parameter of CalculationData objects. You can generate these on the fly via the creation of a LigandSupply object with the optimize flag set to true. This will write RDKit optimized geometries for each molecule contained in the libraries passed into the object (e.g., a CSV of SMILES strings or a list of SMILES strings) You can then pass the LigandSupply.ligands attribute to a VirtualScreen object via specifying smiles=False in the VirtualScreen call, like so: VirtualScreen(*ligands, smiles=False)

I've heard similar complaints that openbabel geometries are bad, but I don't quite know the best way to get around that. I'm hesitant to adopt RDKit optimization by default in the Runner.prepare_ligand methods because that would involve two file writings during the preparation (writing the RDKit geometry and then writing the pdbqt file with openbabel.) That seems inefficient as a base workflow, but the opportunity is currently there should a user wish to do so.

I'm also hesitant to add direct xtb support to psycreener. While we offer the flexibility for users to conduct high-quality docking simulations, that's not the intention of the package. Currently, a user can create an xtb optimized geometry for a set of molecules, and feed these optimized geometries (via their prepared files) to a VirtualScreen as above (via specifying smiles=False.) When an input file is detected, no optimization is performed, so the high-quality geometry should be retained when initializing a docking simulation.

What do you think?

davidegraff commented 3 years ago

I just did some timing tests and found a workaround to this problem like so:

*in `.prepare_from_smi`**

  1. create a Chem.Mol from the SMILES string
  2. add Hs, embed the molecule, and MMFF optimize the Mol
  3. write the Mol to a MOL block and read that string in via pybel.readstring("mol", Chem.MolToMolBlock(mol)
  4. write the pybel.Mol to the appropriate file.

Baseline optimization has this approach take about 2.5-3 ms/mol, whereas the original was only about 2 ms/mol for small molecules, but that was also due to the decreased optimization in pybel.Mol.make3d() (it only optimizes for 50 steps by default, whereas RDKit allows for up to 200.) Optimizing in pybel for 200 steps ends up being about 8 ms/mol, so I think this slight drop in performance should be worth it considering the relative boost in docking quality

rafalbachorz commented 3 years ago

Hello David, Apologies for late answer. Yes, I agree with you. The custom preparation of the ligand structure should not be a subject of the pyscreener. Thanks for a commit which allows for passing the bool smiles variable to virtual_screen, it almost worked to me. To make it working I had to introduce a little change the line 137 in vina/runner.py:

mol.write(format="pdbqt", filename=pdbqt.absolute().as_posix(), overwrite=True, opt={"h": None})

otherwise there was an obvious mismatch because mol.write expects filename variable as string. Thanks a lot again for your prompt support.

davidegraff commented 3 years ago

No, really thank you! It’s great to have someone who’s helping identify issues and areas of improvement for the package. I’ll add that hot fix ASAP. For now, I’ll close the issue, but please let me know if something else comes up on this thread!