coleygroup / pyscreener

pythonic interface to virtual screening software
MIT License
85 stars 32 forks source link

[JOSS review] Installation instructions #21

Closed rvhonorato closed 2 years ago

rvhonorato commented 2 years ago

Companion of openjournals/joss-reviews/issues/3950

The install instructions are not very clear to me, but after following it step-by-step I tried to check it with:

$ pyscreener-check SCREEN_TYPE METADATA_TEMPLATE

Which seem like a nice feature, but its not clear what are the SCREEN_TYPE and METADATA_TEMPLATE and also there is no usage:

$ pyscreener-check -h    
Traceback (most recent call last):
  File "/Users/rodrigo/software/anaconda3/envs/pyscreener_env/bin/pyscreener-check", line 8, in <module>
    sys.exit(check())
  File "/Users/rodrigo/repos/pyscreener/pyscreener/main.py", line 13, in check
    ps.check_env(sys.argv[1], json.loads(sys.argv[2]))
IndexError: list index out of range

Could you please clarify this step?

An additional note is that to install the packages you need to first add the conda-forge channel with $ conda config --append channels conda-forge

rvhonorato commented 2 years ago

I ended up changing the environment.yml to

name: pyscreener_env

channels:
  - conda-forge
  - defaults

dependencies:
  - pip
  - python=3.8
  - openbabel
  - openmm
  - rdkit
  - pip:
      - colorama
      - configargparse
      - git+https://github.com/openmm/pdbfixer.git
      - h5py
      - numpy
      - ray
      - pandas
      - pytest
      - scikit_learn
      - scipy
      - seaborn
      - tqdm

and then:

$ conda env create -f environment.yml
$ conda activate pyscreener_env
$ pip install .

Seems to have done the trick:

$ pyscreener -h
usage: pyscreener [-h] [--config CONFIG] [--version] [--smoke-test] [-o OUTPUT_DIR] [--no-sort] [--collect-all] [-v] [--preprocessing-options {pdbfix,filter} [{pdbfix,filter} ...]] [--pH PH] [-s SMIS [SMIS ...]] [-i INPUT_FILES [INPUT_FILES ...]] [--input-filetypes INPUT_FILETYPES [INPUT_FILETYPES ...]]
                  [--no-title-line] [--smiles-col SMILES_COL] [--name-col NAME_COL] [--id-property ID_PROPERTY] [--use-3d] [--optimize] --screen-type {dock,dock6,ucsfdock,vina,qvina,smina,psovina} [--receptors RECEPTORS [RECEPTORS ...]] [--center CENTER_X CENTER_Y CENTER_Z] [--size SIZE_X SIZE_Y SIZE_Z]
                  [--metadata-template METADATA_TEMPLATE] [--pdbids PDBIDS [PDBIDS ...]] [--docked-ligand-file DOCKED_LIGAND_FILE] [--buffer BUFFER] [-nc NCPU] [--base-name BASE_NAME] [--score-mode {best,avg,boltzmann,top-k}] [--repeat-score-mode {best,avg,boltzmann,top-k}]
                  [--ensemble-score-mode {best,avg,boltzmann,top-k}] [--repeats REPEATS] [-k K] [--postprocessing-options {visualize} [{visualize} ...]] [--hist-mode {image,text}]

Automate virtual screening of compound libraries.

optional arguments:
  -h, --help            show this help message and exit
  --config CONFIG       filepath of a configuration file to use
  --version             show program's version number and exit
  --smoke-test          whether to perform a smoke test by checking if the environment is set up properly
  -o OUTPUT_DIR, --output-dir OUTPUT_DIR
                        the path of the output directory
  --no-sort             do not sort the output scores CSV file by score
  --collect-all         whether all prepared input files and generated output files should be collected to the final output directory. By default, these files are all stored in a node-local temporary directory that is inaccessible after program completion.
  -v, --verbose         the level of output this program should print
  --preprocessing-options {pdbfix,filter} [{pdbfix,filter} ...]
                        the preprocessing options to apply
  --pH PH               the pH for which to calculate protonation state for protein and ligand residues
  -s SMIS [SMIS ...], --smis SMIS [SMIS ...]
                        the SMILES strings of the ligands to dock
  -i INPUT_FILES [INPUT_FILES ...], --input-files INPUT_FILES [INPUT_FILES ...]
                        the filenames containing ligands to dock
  --input-filetypes INPUT_FILETYPES [INPUT_FILETYPES ...]
                        the filetype of each input ligand. If unspecified, will attempt to determine the filetype for each file.
  --no-title-line       whether there is no title line in the ligands CSV file
  --smiles-col SMILES_COL
                        the column containing the SMILES strings in the CSV file.
  --name-col NAME_COL   UNUSED the column containing the molecule names/IDs in the CSV file. Molecules will be labeled as ligand_<i> otherwise.
  --id-property ID_PROPERTY
                        UNUSED the name of the property containing the molecule names/IDs in a SMI or SDF file (e.g., "CatalogID", "Chemspace_ID", "Name", etc.). Molecules will be labeled as ligand_<i> otherwise.
  --use-3d              whether to use the input 3D geometry of each molecule. Note that, in principle, initial geometry of input molecules to flexible docking simulations is statisically insignificant. This option is useful for presevering tautomeric information about input molecules.
  --optimize            whether the geometry of each molecule should be optimized using the RDKit MMFF94 forcefield first. Note that, in principle, initial geometry of input molecules to flexible docking simulations is statisically insignificant.
  --screen-type {dock,dock6,ucsfdock,vina,qvina,smina,psovina}
                        the type of docking screen to perform
  --receptors RECEPTORS [RECEPTORS ...]
                        the filenames of the receptors
  --center CENTER_X CENTER_Y CENTER_Z
                        the x-, y-, and z-coordinates of the center of the docking box
  --size SIZE_X SIZE_Y SIZE_Z
                        the x-, y-, and z-radii of the docking box
  --metadata-template METADATA_TEMPLATE
  --pdbids PDBIDS [PDBIDS ...]
                        the PDB IDs of the crystal structures to dock against
  --docked-ligand-file DOCKED_LIGAND_FILE
                        the filepath of a PDB file containing the docked pose of a ligand from which to automatically construct a docking box
  --buffer BUFFER       the amount of buffer space to add around the docked ligand when calculating the docking box
  -nc NCPU, --ncpu NCPU
  --base-name BASE_NAME
  --score-mode {best,avg,boltzmann,top-k}
                        The method used to calculate the score of a single docking run on a single receptor
  --repeat-score-mode {best,avg,boltzmann,top-k}
                        The method used to calculate the overall score from repeated docking runs
  --ensemble-score-mode {best,avg,boltzmann,top-k}
                        The method used to calculate the overall score from an ensemble of docking runs
  --repeats REPEATS     the number of times to repeat each docking run
  -k K                  the number of top scores to average if using a top-k score mode
  --postprocessing-options {visualize} [{visualize} ...]
                        the postprocessing options to apply
  --hist-mode {image,text}
                        the type of histogram to generate. "image" makes a histogram that is output as a PNG file and "text" generates a histogram using terminal output.

Args that start with '--' (eg. --version) can also be set in a config file (specified via --config). Config file syntax allows: key=value, flag=true, stuff=[a,b,c] (for details, see syntax at https://goo.gl/R74nmi). If an arg is specified in more than one place, then commandline values override config file
values which override defaults

Shouldn't it be simply pip install pyscreener to get it from pip or python setup.py install to build it the cloned repository?

mikemhenry commented 2 years ago

I too found that $ pyscreener-check SCREEN_TYPE METADATA_TEMPLATE was a bit confusing with the location it appears in the documentation, since at that point I wasn't sure what a SCREEN_TYPE or METADATA_TEMPLATE was, but that is explained later.

It may be better to direct users to test their setup with something like

$ pyscreener --config integration-tests/configs/test_vina.ini --smoke-test                                                                                                
Checking environment and metadata for "vina" screen
  Checking PATH and environment variables ... PASS
  Validating metadata ...  PASS
Environment is properly set up!

Since that seems to check everything and --smoke-test is a pretty good description.

davidegraff commented 2 years ago

@mikemhenry good suggestion re: --smoke-test. I'll move towards that approach!

@rvhonorato I would love to pip install pyscreener, but I was running into issues getting pdbfixer installed. I was unaware of the conda distribution, so I was requiring users to build it from source. Regardless, to my knowledge, you can't include a git dependency in a PyPI package. I'm open to feedback on how to make the setup process more streamlined. That's been a consistent critique, but I just can't think of good approaches to handle all the conflicting requirements/depencies. Please let me know what you think will be better (I'll defer to both of your opinions here!)

davidegraff commented 2 years ago

hi @rvhonorato and @rvhonorato,

this issue has been addressed in the recent commits to main: c7494f9fb44d0e13d6a8f53a32aa8550cad46ae3..fb4b0e090468937e6bf7f8274b0e173b43fd0de1

I'm closing this for now, but feel free to reopen if it's not sufficient!