biobakery / phylophlan

Precise phylogenetic analysis of microbial isolates and genomes from metagenomes
https://huttenhower.sph.harvard.edu/phylophlan
MIT License
125 stars 33 forks source link

Default for --min_num_markers #22

Closed adityabandla closed 4 years ago

adityabandla commented 4 years ago

What is the default for the --min_num_markers parameter? The wiki says it's 100 & 34 respectively for the phylophlan and amphora2 marker sets respectively, however phylophlan -h shows that it's set to 1 by default

When running it with the phylophlan database using -d phylophlan, --min_num_markers called by phylophlan still seems to default to 1

fasnicar commented 4 years ago

Hi, As written in the wiki if you specify as database -d phylophlan or -d amphora2 then --min_num_markers will be automatically set to 100 or 34, respectively. Note that you need to specify --verbose to make PhyloPhlAn print that one of the above choices was made.

The above automatic values work unless you specify --min_num_markers in the command line. The command line has higher priority over the defaults and overrides the default values.

You said that it seems that PhyloPhlAn is still using 1 when -d phylophlan, this is very strange, but if you can give an example I can have a look.

adityabandla commented 4 years ago

Hi Francesco, yes, I did specify --verbose. I installed phylophlan v3 from bioconda phylophlan --version gives me PhyloPhlAn version 3.0.51 (11 May 2020)

fasnicar commented 4 years ago

Ok, so do you see Setting "min_num_markers=100" [..] from the log of PhyloPhlAn?

adityabandla commented 4 years ago

Does phylophlan produce a separate log file? My observations are based on the messages that get printed in STDOUT with --verbose which lists all the parameter values.

fasnicar commented 4 years ago

Sorry, no there is no separate log, the stdout is what I'm referring to.

It should print the full command line and then the progress of the analysis. Between the command line and the progress of the analysis you should see the line I wrote above. If you want to attach the I can have a look at it.

adityabandla commented 4 years ago
PhyloPhlAn version 3.0.51 (11 May 2020)

Command line: /home/projects/11001755/miniconda3/envs/phylophlan/bin/phylophlan -i sp_faa -d phylophlan --diversity high --accurate -f supermatrix_aa.cfg --verbose -o phylo_tree --nproc 24

Automatically setting "input=sp_faa" and "input_folder=/home/projects/11001713/datasets/glv_mg/data/processed/5_binning/99_final_bin_set_renamed/10_phylophlan"
Creating folder "phylo_tree"
Creating folder "phylo_tree/tmp"
"high-accurate" preset
Setting "sort=True" because "database=phylophlan"
Arguments: {'input': 'sp_faa', 'clean': None, 'output': 'phylo_tree', 'database': 'phylophlan', 'db_type': None, 'config_file': 'phylophlan_configs/supermatrix_aa.cfg', 'diversity': 'high', 'accurate': True, 'fast': False, 'clean_all': False, 'database_list': False, 'submat': 'pfasum60', 'submat_list': False, 'submod_list': False, 'nproc': 24, 'min_num_proteins': 1, 'min_len_protein': 50, 'min_num_markers': 1, 'trim': 'greedy', 'gap_perc_threshold': 0.67, 'not_variant_threshold': 0.95, 'subsample': <function twentyfive at 0x2aaae5693680>, 'unknown_fraction': 0.3, 'scoring_function': <function trident at 0x2aaae56938c0>, 'sort': True, 'remove_fragmentary_entries': False, 'fragmentary_threshold': 0.75, 'min_num_entries': 4, 'maas': None, 'remove_only_gaps_entries': False, 'mutation_rates': False, 'force_nucleotides': False, 'input_folder': '/home/projects/11001713/datasets/glv_mg/data/processed/5_binning/99_final_bin_set_renamed/10_phylophlan/sp_faa', 'data_folder': 'phylo_tree/tmp', 'databases_folder': 'phylophlan_databases/', 'submat_folder': '/home/projects/11001755/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_matrices/', 'submod_folder': '/home/projects/11001755/miniconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_models/', 'configs_folder': 'phylophlan_configs/', 'output_folder': '', 'genome_extension': '.fna', 'proteome_extension': '.faa', 'update': False, 'verbose': True}
Loading configuration file "phylophlan_configs/supermatrix_aa.cfg"
Checking configuration file
fasnicar commented 4 years ago

Hi, thanks for reporting this. It is now fixed with commit ID 30117ce0e71a176d67f0c5d21f55f6a07fa89da8 This version is not yet available in Bioconda, so for the moment, you should get PhyloPhlAn directly from the repository.

Many thanks, Francesco