Open EricDeveaud opened 3 years ago
Hello Eric, and thanks for using PhyloPhlAn!
You can use the --databases_folder
parameter to specify the path where the database(s) are located.
Many thanks, Francesco
my concern is what to download in order to provide the 2 databases phylophlan
and amphora2
let say I want to have the databases hosted on /opt/data/phylophlan/3.02
if I understood correctly I have to download the follwowing file to this directory
https://www.dropbox.com/s/xdqm836d2w22npb/phylophlan_metagenomic.txt https://www.dropbox.com/s/l73jvga66ql4ows/SGB.Dec19.md5 https://www.dropbox.com/s/djm9thsykn9h63s/SGB.Dec19.tar https://www.dropbox.com/s/dw947euykyjeee7/SGB.Dec19.txt.bz2
is that correct ?
regards
Eric
Hi Eric,
got it! The links you provided are for the phylophlan_metagenomic
and are not the phylophlan
and amphora2
databases.
I think the easiest thing to do is to create a fake input folder with 4 genomes in it and run phylophlan
twice from a machine with internet connection, the first time specifying the phylophlan
database and the second time the amphora2
database. At the beginning of PhyloPhlAn will check and automatically download the database if not present in the --databases_folder
.
phylophlan [mandatory_params] -d phylophlan --databases_folder /opt/data/phylophlan/3.02 --verbose
phylophlan [mandatory_params] -d amphora2 --databases_folder /opt/data/phylophlan/3.02 --verbose
Note: You can kill the runs above as soon as the databases are downloaded.
Alternatively, you can download this file: http://cmprod1.cibio.unitn.it/databases/PhyloPhlAn/phylophlan_databases.txt
and then download the two files for each database:
amphora2
http://cmprod1.cibio.unitn.it/databases/PhyloPhlAn/amphora2.tar
http://cmprod1.cibio.unitn.it/databases/PhyloPhlAn/amphora2.md5phylophlan
http://cmprod1.cibio.unitn.it/databases/PhyloPhlAn/phylophlan.tar
http://cmprod1.cibio.unitn.it/databases/PhyloPhlAn/phylophlan.md5and store them in the folder you want to use for the databases.
Please, let me know if something is not clear.
Many thanks, Francesco
done that and untared both archive now phylophlan --list-databases show me the DBs
[gensoft@db6b0d05cdf9 inst]$ phylophlan --databases_folder /opt/gensoft/data/phylophlan/3.0.2/ --database_list --diversity high
Available databases in "/opt/gensoft/data/phylophlan/3.0.2/":
amphora2
phylophlan
NB having this procedure in the installation instruction would be a plus
alos it would be nice to have DATABASES_FOLDER
defined via an environement variable
something like that in phylophlan.py
DATABASES_FOLDER = os.environ.get('PHYLOPHLAN_DATABASE_DIR', 'phylophlan_databases')
one can export PHYLOPHLAN_DATABASE_DIR
to the location of the databases directory and have phylophlan find the db wiithout having to use the --databases_folder
options
what do you think about that ?
alos may I have some information of the reference folder ? (keep in mind I'm not biologist at all, just in charge of the installation and maintenance of software on our cluster, so exuse some silly questions ;-))
can I run phylophlan_get_references -g all -o some_dir
and provide those data to our users ?
again having an env var would be nnice.
regards
eric
Eric
Great!
Yes, I'll add this to the wiki.
About the env variable, I'll implement it in the following release, although I believe that for non-computational people the param is easier to use, I also think it should not be too complicated to have both options working.
About phylophlan_get_references -g all -o some_dir
yes, the genomes retrieved from there are all publicly available so no problem in getting them and then allowed the users to access the resource.
Many thanks, Francesco
hello,
wik may requires 2 mores informations.
bz2 files must be bunzipd and concatenated to respectively
phylophlan.faa
and amphora2.faa
then indexed with diamond.
note, we are providing software and data on read only file system to our users, so downloading db files is not enough, we also need to process them to avoid 'write permission' when they run phylophlan for the first time
so full instruction will be
DB_DIR=/whatever/you/want/to/host/databases
mkdir $DB_DIR
cd $DB_DIR
wget http://cmprod1.cibio.unitn.it/databases/PhyloPhlAn/phylophlan_databases.txt
wget http://cmprod1.cibio.unitn.it/databases/PhyloPhlAn/amphora2.tar
wget http://cmprod1.cibio.unitn.it/databases/PhyloPhlAn/amphora2.md5
wget http://cmprod1.cibio.unitn.it/databases/PhyloPhlAn/phylophlan.tar
wget http://cmprod1.cibio.unitn.it/databases/PhyloPhlAn/phylophlan.md5
tar xf amphora2.tar
tar xf phylophlan.tar
bzcat * >> ampphora2/*.bz2 amphora/amphora2.faa
bzcat * >> phylophlan/*.bz2 phylophlan/phylophlan.faa
diamond makedb '--threads <N> --in $DB_DIR/amphora2/amphora2.faa --db $DB_DIR/amphora2/amphora2
diamond makedb '--threads <N> --in $DB_DIR/phylophlan/phylophlan.faa --db $DB_DIR/phylophlan/phylophlan:w
NB I skipped the md5 check which is pretty obvious ;-)
should the *.bz2 files keept ?
regards
Eric
one more question... maybee not... ;-)
when using phylophlan_setup_db
eg phylophlan_setup_database.py -g s__Staphylococcus_aureus
should'nt the s__Staphylococcus_aureus
db directory be generated by default in phylophlan_databases ?
and why diamond indexation is mnot carried out by phylophlan_setup_db
?
Eric
Hi Eric,
I see and you're right if the file system is read-only then one has to perform also the decompression and indexing. I'll add this, thank you. Although an important thing to remember here is that when using diamond, different diamond versions produce different indexed databases that are not compatible. So, one has to ensure that the very same version used for indexing is also used in the config file when running PhyloPhlAn.
For the reason above, phylophlan_setup_database
only download and prepare the database but will not carry out the indexing as in that case, it would also require the configuration for the tool to use for indexing, which is something that depends more on phylophlan
.
Many thanks, Francesco
I will provide phylophlan as a module via environnement modules. diamonn version will be fixed in our case (2.6). but I understand the point
but I would say that since diaomnd v0.9.25 to current produce format version 3 and accept format version 2-3
see: https://github.com/bbuchfink/diamond/wiki/5.-Advanced-topics
having dependencies fullfilled for diamond >=0.9.25 will solve the problem
nowadays last diamond version is 2.10.
regards
Eric
Hello,
how can I setup the requested phylophlan databases for use on a cluster where compute nodes does not have network access.
I would like to install phylophlan and provide the DBs on a shared folder.
how can I acheieve this task.
regards
Eric