biobakery / phylophlan

Precise phylogenetic analysis of microbial isolates and genomes from metagenomes
https://huttenhower.sph.harvard.edu/phylophlan
MIT License
128 stars 33 forks source link

write_default_configs.sh missing #20

Closed nick-youngblut closed 4 years ago

nick-youngblut commented 4 years ago

I've forked phylophlan and I planning on making some unit tests, since those seem to be absent from this repo. I'm working on making a shortened version of Example 01, but am running into the error:

[e] "/ebio/abt3_projects/software/dev/miniconda3_dev/lib/python3.7/site-packages/PhyloPhlAn-3.0-py3.7.egg/phylophlan/phylophlan_configs/" folder does not exists
Creating folder "output_references"
Creating folder "output_references/tmp"
"low-fast" preset
Traceback (most recent call last):
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/bin/phylophlan", line 11, in <module>
    load_entry_point('PhyloPhlAn==3.0', 'console_scripts', 'phylophlan')()
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/lib/python3.7/site-packages/PhyloPhlAn-3.0-py3.7.egg/phylophlan/phylophlan.py", line 3169, in phylophlan_main
    project_name = check_args(args, sys.argv, verbose=args.verbose)
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/lib/python3.7/site-packages/PhyloPhlAn-3.0-py3.7.egg/phylophlan/phylophlan.py", line 492, in check_args
    elif os.path.isfile(os.path.join(args.configs_folder, args.config_file)):
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/lib/python3.7/posixpath.py", line 80, in join
    a = os.fspath(a)

It appears that phylophlan is looking for a config file that doesn't exist in the bioconda install. The phylophlan CLI help doc states:

                      The configuration file to load, four ready-to-use
                        configuration files can be generated using the
                        "write_default_configs.sh" script present in the
                        "configs" folder (default: None)

...but write_default_configs.sh doesn't exist.

A couple of general comments about the phylophlan code:

  1. The bash scripts in the examples use the *.py extensions for the phylophlan executables, but the executables, once installed, don't have the *.py extension.
  2. Why not use the logging package instead of creating custom info() and error() functions?
  3. Do you see any problem with adding a --max_proteins param to phylophlan_setup_database in order to reduce the execution time for testing (eg., max of 100 core proteins)?
fasnicar commented 4 years ago

Dear Nick,

thanks for reporting this. I think you've installed an old PhyloPhlAn package that had the above issue, which has been fixed in the latest package available in Bioconda. So, can you re-try after updating the package to the latest one?

Many thanks for your comments and

  1. This should be fixed in the new package
  2. I started re-writing PhyloPhlAn a few years ago and simply didn't like logging at that time, so end up defining two printing functions, nothing more than this
  3. I don't see any problem in adding this parameter, and I think it is quite easy to add

Many thanks, Francesco

nick-youngblut commented 4 years ago

I am using the most up-to-date bioconda build. The folder does not exists error may be due to re-installing phylophlan via python setup.py install into the conda env in order to incorporate my code edits (a --max_proteins param for phylophlan_setup_database). Maybe adding the default config files to package_data in setup.py would make this easier?

I searched through the repo, but I can't find write_default_configs.sh. I'm guessing that the CLI help doc was referring to phylophlan_write_default_configs.sh. I ran this, but the example 01 phylophlan job returns the error:

[e] configuration file "isolates_config.cfg" not found
Available configuration files in "configs":
    configs/supermatrix_aa.cfg
    configs/supermatrix_nt.cfg
    configs/supertree_aa.cfg
    configs/supertree_nt.cfg

It appears that phylophlan_write_default_configs.sh doesn't generate the "isolates_config.cfg" file specified in Example 01. I can't find any other reference to "isolates_config.cfg" in the repo.

fasnicar commented 4 years ago

I am using the most up-to-date bioconda build. The folder does not exists error may be due to re-installing phylophlan via python setup.py install into the conda env in order to incorporate my code edits (a --max_proteins param for phylophlan_setup_database). Maybe adding the default config files to package_data in setup.py would make this easier?

Strange, if you look at the setup.py in the repo (https://github.com/biobakery/phylophlan/blob/master/setup.py) we removed the generation of the default configs from there and put the phylophlan_write_default_configs.sh as a script. accordingly, we updated the Installation section of the user manual (https://github.com/biobakery/phylophlan/wiki#conda-package-easy), explicating now that the user should run the phylophlan_write_default_configs.sh script. We can't provide default configs using he package_data in the setup.py as the configs are generated based on what executables are found in the system at run time. Maybe I am missing something here.

I searched through the repo, but I can't find write_default_configs.sh. I'm guessing that the CLI help doc was referring to phylophlan_write_default_configs.sh. I ran this, but the example 01 phylophlan job returns the error:

Thanks, I corrected the typo in the code.

It appears that phylophlan_write_default_configs.sh doesn't generate the "isolates_config.cfg" file specified in Example 01. I can't find any other reference to "isolates_config.cfg" in the repo.

It doesn't as the phylophlan_write_default_configs.sh is a utility the easily generates 4 general configuration files for the user. Though, how to generate the config files used in the first example using the phylophlan_write_config_file command are reported in the tutorial (end of Step 3 and 4, respectively) https://github.com/biobakery/biobakery/wiki/PhyloPhlAn-3.0:-Example-01:-S.-aureus.

I hope these are helping.