Closed Hocnonsense closed 4 years ago
Hi!
the scripts in the examples
folders contain the .py
because they assume you got PhyloPhlAn by cloning the repo and through the conda package (which will not provide the examples).
Apologies. The script is not updated with the latest version of the phylophlan_metagenomic.py
script. Updated now (136f2f12f2902d5e71de6450b8f7558683ddd9df) in the repo. Basically, the -d
is now a required parameter, this to ensure no different database version are used to process batches of samples of the same project.
This shouldn't happen if you already have the phylophlan_metagenomic.txt
file. Is the file in the correct folder?
As we all known (laugh cry), we cannot download from dropbox directly.
And my bash history is like this:
(phylophlan) [clsxx@cas556 ~/Work/2020-09-MgAffect/Analyze/phylophlan]$phylophlan \
> -i ${bin_dir} \
> -d phylophlan --diversity high -f supertree_aa.cfg \
> --genome_extension .fa \
> --nproc 4 \
> --maas phylophlan_substitution_models/phylophlan.tsv \
> --verbose\
> 2>&1 | tee tmp2.log
PhyloPhlAn version 3.0.59 (10 November 2020)
Command line: /lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/bin/phylophlan -i /lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect//F-06-MAG/03_modify/7_final/ -d phylophlan --diversity high -f supertree_aa.cfg --genome_extension .fa --nproc 4 --maas phylophlan_substitution_models/phylophlan.tsv --verbose
Automatically setting "input=7_final" and "input_folder=/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify"
[e] "/lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/lib/python3.9/site-packages/PhyloPhlAn-3.0.1-py3.9.egg/phylophlan/phylophlan_configs/" folder does not exists
Creating folder "7_final_phylophlan"
Creating folder "7_final_phylophlan/tmp"
"high-accurate" preset
Setting "sort=True" because "database=phylophlan"
Setting "min_num_markers=100" since no value has been specified and the "database=phylophlan"
Arguments: {'input': '7_final', 'clean': None, 'output': '7_final_phylophlan', 'database': 'phylophlan', 'db_type': None, 'config_file': 'supertree_aa.cfg', 'diversity': 'high', 'accurate': True,
'fast': False, 'clean_all': False, 'database_list': False, 'submat': 'pfasum60', 'submat_list': False, 'submod_list': False, 'nproc': 4, 'min_num_proteins': 1, 'min_len_protein': 50, 'min_num_markers': 100, 'trim': 'greedy', 'gap_perc_threshold': 0.67, 'not_variant_threshold': 0.95, 'subsample': <function twentyfive at 0x2b6080061f70>, 'unknown_fraction': 0.3, 'scoring_function': <function trident at 0x2b60800621f0>, 'sort': True, 'remove_fragmentary_entries': False, 'fragmentary_threshold': 0.75, 'min_num_entries': 4, 'maas': 'phylophlan_substitution_models/phylophlan.tsv', 'remove_only_gaps_entries': False, 'mutation_rates': False, 'force_nucleotides': False, 'input_folder':
'/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify/7_final', 'data_folder': '7_final_phylophlan/tmp', 'databases_folder': 'phylophlan_databases/', 'submat_folder': '/lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/lib/python3.9/site-packages/PhyloPhlAn-3.0.1-py3.9.egg/phylophlan/phylophlan_substitution_matrices/', 'submod_folder': 'phylophlan_substitution_models/', 'configs_folder': None, 'output_folder': '', 'genome_extension': '.fa', 'proteome_extension': '.faa', 'update': False, 'verbose': True}
Loading configuration file "supertree_aa.cfg"
Checking configuration file
Checking "/lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/bin/diamond"
Checking "/lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/bin/mafft"
Checking "/lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/bin/trimal"
Checking "/lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/bin/FastTree"
Checking "/lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/bin/raxmlHPC"
Checking "java"
File "phylophlan_databases/phylophlan_databases.txt" present
Downloading "https://zenodo.org/record/4005620/files/phylophlan.tar?download=1" to "phylophlan_databases/phylophlan.tar"
Downloading file of size: 64.05 MB
^C03 MB 4.73 % 0.37 MB/sec 2 min 47 sec
(phylophlan) [clsxx@cas556 ~/Work/2020-09-MgAffect/Analyze/phylophlan]$ls -l phylophlan_databases/
total 3112
-rw-rw-r-- 1 clsxx clsxx 323 Nov 13 16:47 phylophlan_databases.txt
-rw-rw-r-- 1 clsxx clsxx 3027 Nov 13 16:47 phylophlan_metagenomic.txt
-rw-rw-r-- 1 clsxx clsxx 3178496 Nov 13 16:51 phylophlan.tar
(phylophlan) [clsxx@cas556 ~/Work/2020-09-MgAffect/Analyze/phylophlan]$phylophlan_metagenomic
-i ${bin_dir} -d SGB.Jul20 --database_folder ~/software/phylophlan/phylophlan_databases
--nproc 4 --verbose 2>&1 | tee tmp1.log
phylophlan_metagenomic.py version 3.0.34 (18 August 2020)
Command line: /lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/bin/phylophlan_metagenomic -i /lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect//F-06-MAG/03_modify/7_final/ -d SGB.Jul20 --database_folder /lustre/home/acct-clsxx/clsxx/software/phylophlan/phylophlan_databases --nproc 4 --verbose
Setting --database_folder to "/lustre/home/acct-clsxx/clsxx/software/phylophlan/phylophlan_databases"
Setting input extension to ".fa"
Setting output prefix to "/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify/7_final"
Output prefix is a folder, setting it to "/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify/7_final/7_final"
Folder "/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify/7_final/7_final_sketches" already present
Folder "/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify/7_final/7_final_sketches/inputs" already present
Folder "/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify/7_final/7_final_dists" already present
Arguments: {'input': '/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect//F-06-MAG/03_modify/7_final/', 'output_prefix': '/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify/7_final/7_final', 'database': 'SGB.Jul20', 'database_list': False, 'database_update': False, 'input_extension': '.fa', 'how_many': 10, 'nproc': 4, 'database_folder': '/lustre/home/acct-clsxx/clsxx/software/phylophlan/phylophlan_databases', 'only_input': False, 'add_ggb': False, 'add_fgb': False, 'overwrite': False, 'verbose': True, 'mapping': 'SGB.Jul20.txt.bz2'}
Checking "mash"
Downloading "https://www.dropbox.com/s/xdqm836d2w22npb/phylophlan_metagenomic.txt?dl=1" to "phylophlan_metagenomic.txt"
[e] unable to download "https://www.dropbox.com/s/xdqm836d2w22npb/phylophlan_metagenomic.txt?dl=1"
(phylophlan) [clsxx@cas556 ~/Work/2020-09-MgAffect/Analyze/phylophlan]$
Hi and thanks for the log.
As you can see phylophlan_metagenomic.py
set as database_folder
the path /lustre/home/acct-clsxx/clsxx/software/phylophlan/phylophlan_databases
.
Arguments: {'input': '/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect//F-06-MAG/03_modify/7_final/', 'output_prefix': '/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify/7_final/7_final', 'database': 'SGB.Jul20', 'database_list': False, 'database_update': False, 'input_extension': '.fa', 'how_many': 10, 'nproc': 4, 'database_folder': '/lustre/home/acct-clsxx/clsxx/software/phylophlan/phylophlan_databases', 'only_input': False, 'add_ggb': False, 'add_fgb': False, 'overwrite': False, 'verbose': True, 'mapping': 'SGB.Jul20.txt.bz2'}
which is not the location where you downloaded the phylophlan_metagenomic.txt
file (appears to be ~/Work/2020-09-MgAffect/Analyze/phylophlan/phylophlan_databases/
).
Se, when running phylophlan_metagenomic.py
you can specify as database folder the path to the folder containing the phylophlan_metagenomic.txt
file you downloaded, using the --database_folder
param, and this should solve the download issue.
Many thanks, Francesco
Now I found the bug:
In phylophlan.py
, you modified database_download
:
https://github.com/biobakery/phylophlan/blob/136f2f12f2902d5e71de6450b8f7558683ddd9df/phylophlan/phylophlan.py#L3206-L3207
However, in phylophlan_metagenomic.py
, file will be downloaded to current path:
https://github.com/biobakery/phylophlan/blob/136f2f12f2902d5e71de6450b8f7558683ddd9df/phylophlan/phylophlan_metagenomic.py#L715-L716
Now I'm trying to run
phylophlan_metagenomic
3 bugs:samples/
have commands likephylophlan_metagenomic.py \
. However, there should not be a.py
-d
be dismissed?phylophlan_metagenomic.txt
to--database_folder
, the program still doingDownloading "https://www.dropbox.com/s/xdqm836d2w22npb/phylophlan_metagenomic.txt?dl=1" to "phylophlan_metagenomic.txt"
, and then[e] unable to download "https://www.dropbox.com/s/xdqm836d2w22npb/phylophlan_metagenomic.txt?dl=1"
Thanks!