biobakery / phylophlan

Precise phylogenetic analysis of microbial isolates and genomes from metagenomes
https://huttenhower.sph.harvard.edu/phylophlan
MIT License
128 stars 33 forks source link

[e] unable to download "https://www.dropbox.com ... in phylophlan_metagenomic #50

Closed Hocnonsense closed 4 years ago

Hocnonsense commented 4 years ago

Now I'm trying to run phylophlan_metagenomic 3 bugs:

  1. all the bash file in your samples/ have commands like phylophlan_metagenomic.py \. However, there should not be a .py
  2. In this version, can -d be dismissed?
  3. Though I already download phylophlan_metagenomic.txt to --database_folder, the program still doing Downloading "https://www.dropbox.com/s/xdqm836d2w22npb/phylophlan_metagenomic.txt?dl=1" to "phylophlan_metagenomic.txt", and then [e] unable to download "https://www.dropbox.com/s/xdqm836d2w22npb/phylophlan_metagenomic.txt?dl=1"

Thanks!

fasnicar commented 4 years ago

Hi!

  1. the scripts in the examples folders contain the .py because they assume you got PhyloPhlAn by cloning the repo and through the conda package (which will not provide the examples).

  2. Apologies. The script is not updated with the latest version of the phylophlan_metagenomic.py script. Updated now (136f2f12f2902d5e71de6450b8f7558683ddd9df) in the repo. Basically, the -d is now a required parameter, this to ensure no different database version are used to process batches of samples of the same project.

  3. This shouldn't happen if you already have the phylophlan_metagenomic.txt file. Is the file in the correct folder?

Hocnonsense commented 4 years ago

As we all known (laugh cry), we cannot download from dropbox directly.

And my bash history is like this:

(phylophlan) [clsxx@cas556 ~/Work/2020-09-MgAffect/Analyze/phylophlan]$phylophlan \
>     -i ${bin_dir} \
>     -d phylophlan --diversity high -f supertree_aa.cfg \
>     --genome_extension .fa \
>     --nproc 4 \
>     --maas phylophlan_substitution_models/phylophlan.tsv \
>     --verbose\
>     2>&1 | tee tmp2.log
PhyloPhlAn version 3.0.59 (10 November 2020)

Command line: /lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/bin/phylophlan -i /lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect//F-06-MAG/03_modify/7_final/ -d phylophlan --diversity high -f supertree_aa.cfg --genome_extension .fa --nproc 4 --maas phylophlan_substitution_models/phylophlan.tsv --verbose

Automatically setting "input=7_final" and "input_folder=/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify"
[e] "/lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/lib/python3.9/site-packages/PhyloPhlAn-3.0.1-py3.9.egg/phylophlan/phylophlan_configs/" folder does not exists
Creating folder "7_final_phylophlan"
Creating folder "7_final_phylophlan/tmp"
"high-accurate" preset
Setting "sort=True" because "database=phylophlan"
Setting "min_num_markers=100" since no value has been specified and the "database=phylophlan"     
Arguments: {'input': '7_final', 'clean': None, 'output': '7_final_phylophlan', 'database': 'phylophlan', 'db_type': None, 'config_file': 'supertree_aa.cfg', 'diversity': 'high', 'accurate': True, 
'fast': False, 'clean_all': False, 'database_list': False, 'submat': 'pfasum60', 'submat_list': False, 'submod_list': False, 'nproc': 4, 'min_num_proteins': 1, 'min_len_protein': 50, 'min_num_markers': 100, 'trim': 'greedy', 'gap_perc_threshold': 0.67, 'not_variant_threshold': 0.95, 'subsample': <function twentyfive at 0x2b6080061f70>, 'unknown_fraction': 0.3, 'scoring_function': <function trident at 0x2b60800621f0>, 'sort': True, 'remove_fragmentary_entries': False, 'fragmentary_threshold': 0.75, 'min_num_entries': 4, 'maas': 'phylophlan_substitution_models/phylophlan.tsv', 'remove_only_gaps_entries': False, 'mutation_rates': False, 'force_nucleotides': False, 'input_folder': 
'/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify/7_final', 'data_folder': '7_final_phylophlan/tmp', 'databases_folder': 'phylophlan_databases/', 'submat_folder': '/lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/lib/python3.9/site-packages/PhyloPhlAn-3.0.1-py3.9.egg/phylophlan/phylophlan_substitution_matrices/', 'submod_folder': 'phylophlan_substitution_models/', 'configs_folder': None, 'output_folder': '', 'genome_extension': '.fa', 'proteome_extension': '.faa', 'update': False, 'verbose': True}
Loading configuration file "supertree_aa.cfg"
Checking configuration file
Checking "/lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/bin/diamond"
Checking "/lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/bin/mafft"
Checking "/lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/bin/trimal"
Checking "/lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/bin/FastTree"
Checking "/lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/bin/raxmlHPC"
Checking "java"
File "phylophlan_databases/phylophlan_databases.txt" present
Downloading "https://zenodo.org/record/4005620/files/phylophlan.tar?download=1" to "phylophlan_databases/phylophlan.tar"
Downloading file of size: 64.05 MB
^C03 MB 4.73 %   0.37 MB/sec  2 min 47 sec
(phylophlan) [clsxx@cas556 ~/Work/2020-09-MgAffect/Analyze/phylophlan]$ls -l phylophlan_databases/
total 3112
-rw-rw-r-- 1 clsxx clsxx     323 Nov 13 16:47 phylophlan_databases.txt  
-rw-rw-r-- 1 clsxx clsxx    3027 Nov 13 16:47 phylophlan_metagenomic.txt
-rw-rw-r-- 1 clsxx clsxx 3178496 Nov 13 16:51 phylophlan.tar
(phylophlan) [clsxx@cas556 ~/Work/2020-09-MgAffect/Analyze/phylophlan]$phylophlan_metagenomic     
-i ${bin_dir}     -d SGB.Jul20     --database_folder ~/software/phylophlan/phylophlan_databases   
  --nproc 4     --verbose     2>&1 | tee tmp1.log
phylophlan_metagenomic.py version 3.0.34 (18 August 2020)

Command line: /lustre/home/acct-clsxx/clsxx/software/anaconda3/envs/phylophlan/bin/phylophlan_metagenomic -i /lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect//F-06-MAG/03_modify/7_final/ -d SGB.Jul20 --database_folder /lustre/home/acct-clsxx/clsxx/software/phylophlan/phylophlan_databases --nproc 4 --verbose

Setting --database_folder to "/lustre/home/acct-clsxx/clsxx/software/phylophlan/phylophlan_databases"
Setting input extension to ".fa"
Setting output prefix to "/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify/7_final"
Output prefix is a folder, setting it to "/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify/7_final/7_final"
Folder "/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify/7_final/7_final_sketches" already present
Folder "/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify/7_final/7_final_sketches/inputs" already present
Folder "/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify/7_final/7_final_dists" already present

Arguments: {'input': '/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect//F-06-MAG/03_modify/7_final/', 'output_prefix': '/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify/7_final/7_final', 'database': 'SGB.Jul20', 'database_list': False, 'database_update': False, 'input_extension': '.fa', 'how_many': 10, 'nproc': 4, 'database_folder': '/lustre/home/acct-clsxx/clsxx/software/phylophlan/phylophlan_databases', 'only_input': False, 'add_ggb': False, 'add_fgb': False, 'overwrite': False, 'verbose': True, 'mapping': 'SGB.Jul20.txt.bz2'}

Checking "mash"
Downloading "https://www.dropbox.com/s/xdqm836d2w22npb/phylophlan_metagenomic.txt?dl=1" to "phylophlan_metagenomic.txt"
[e] unable to download "https://www.dropbox.com/s/xdqm836d2w22npb/phylophlan_metagenomic.txt?dl=1"
(phylophlan) [clsxx@cas556 ~/Work/2020-09-MgAffect/Analyze/phylophlan]$
fasnicar commented 4 years ago

Hi and thanks for the log.

As you can see phylophlan_metagenomic.py set as database_folder the path /lustre/home/acct-clsxx/clsxx/software/phylophlan/phylophlan_databases.

Arguments: {'input': '/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect//F-06-MAG/03_modify/7_final/', 'output_prefix': '/lustre/home/acct-clsxx/clsxx/Work/2020-09-MgAffect/F-06-MAG/03_modify/7_final/7_final', 'database': 'SGB.Jul20', 'database_list': False, 'database_update': False, 'input_extension': '.fa', 'how_many': 10, 'nproc': 4, 'database_folder': '/lustre/home/acct-clsxx/clsxx/software/phylophlan/phylophlan_databases', 'only_input': False, 'add_ggb': False, 'add_fgb': False, 'overwrite': False, 'verbose': True, 'mapping': 'SGB.Jul20.txt.bz2'}

which is not the location where you downloaded the phylophlan_metagenomic.txt file (appears to be ~/Work/2020-09-MgAffect/Analyze/phylophlan/phylophlan_databases/).

Se, when running phylophlan_metagenomic.py you can specify as database folder the path to the folder containing the phylophlan_metagenomic.txt file you downloaded, using the --database_folder param, and this should solve the download issue.

Many thanks, Francesco

Hocnonsense commented 4 years ago

Now I found the bug: In phylophlan.py, you modified database_download: https://github.com/biobakery/phylophlan/blob/136f2f12f2902d5e71de6450b8f7558683ddd9df/phylophlan/phylophlan.py#L3206-L3207 However, in phylophlan_metagenomic.py, file will be downloaded to current path: https://github.com/biobakery/phylophlan/blob/136f2f12f2902d5e71de6450b8f7558683ddd9df/phylophlan/phylophlan_metagenomic.py#L715-L716