gjospin / PhyloSift

Phylogenetic and taxonomic analysis for genomes and metagenomes
82 stars 17 forks source link

Trouble to run phylosift on server #475

Open dong5600 opened 8 years ago

dong5600 commented 8 years ago

Hello,

I was able to successfully run Phylosift on my desktop. Since it is too slow (at least 3-5 hrs/per sample) to make my work of >50 samples feasible, I have tried to install the program on our institute's server, which has not gotten positive results.

After unzip the phylosift file on the server, I did a test run for the data I had success for my desktop version, but it gave error information and the marker database was not downloaded in the first run. I searched online, and followed the protocol in an earlier report (https://groups.google.com/forum/#!topic/phylosift/6DkF-rzKbdw) manually downloaded the databases and uncompress them. But still could not get progress.

My error message and the --debug information are listed as below. Any help is highly appreciated!

Results for the taxasummary.txt:

Taxon_ID Taxon_Rank Taxon_Name Probability_Mass

Unclassifiable Unknown Unknown 0

Error information:

PhyloSift -- Phylogenetic analysis of genomes and metagenomes (c) 2011, 2012 Aaron Darling and Guillaume Jospin

CITATION: PhyloSift. A. E. Darling, G. Jospin, E. Lowe, F. A. Matsen, H. M. Bik, J. A. Eisen. Submitted to PeerJ

PhyloSift incorporates several other software packages, please consider also citing the following papers:

    pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree.
    Frederick A Matsen, Robin B Kodner, and E Virginia Armbrust
    BMC Bioinformatics 2010, 11:538

    Adaptive seeds tame genomic sequence comparison.
    SM Kielbasa, R Wan, K Sato, P Horton, MC Frith
    Genome Research 2011.

    Infernal 1.0: Inference of RNA alignments
    E. P. Nawrocki, D. L. Kolbe, and S. R. Eddy
    Bioinformatics 25:1335-1337 (2009)

    Bowtie: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.
    Langmead B, Trapnell C, Pop M, Salzberg SL. Genome Biol 10:R25.

    HMMER 3.0 (March 2010); http://hmmer.org/
    Copyright (C) 2010 Howard Hughes Medical Institute.
    Freely distributed under the GNU General Public License (GPLv3).

    Phylogenetic Diversity within Seconds.
    Bui Quang Minh, Steffen Klaere and Arndt von Haeseler
    Syst Biol (2006) 55 (5): 769-773.

rm: cannot remove `/home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v1.0.1/PS_temp/ACFbin1.fa/blastDir/.aa.1_': No such file or directory

Debug info:

All systems are good to go, continuing the screening deleting an old run /home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v1.0.1/PS_temp/ACF_bin1.fa MODE : all Using updated markers Using a marker list file /home/a-m/dong5600/share/phylosift/markers/marker_list.txt Before runBlast 2016-01-22 16:53:07 USING 0 Input type is dna, fasta Making fifos Launching search process 1 Running /home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v1.0.1/bin/lastal -F15 -e75 -f0 /home/a-m/dong5600/share/phylosift/markers/replast "/home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v1.0.1/PS_temp/ACF_bin1.fa/blastDir/last_0.pipe" |Opening /home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v1.0.1/PS_temp/ACF_bin1.fa/blastDir/reads.fasta.1 Octopus is handing out sequences Octopus handed out 172 sequences Writing candidates from process 1 ReadsFile: ACF_bin1.fa .lastal Got 0 markers with hits .lastal Got 0 nucleotide markers with hits After runBlast 2016-01-22 16:53:07 Before runAlign 2016-01-22 16:53:07 after marker prep AFTER ALIGN and MASK Using a marker list file /home/a-m/dong5600/share/phylosift/markers/marker_list.txt Using a marker list file /home/a-m/dong5600/share/phylosift/markers/marker_list.txt AFTER concatenateALI After runAlign 2016-01-22 16:53:07 Before runPplacer 2016-01-22 16:53:07 After runPplacer 2016-01-22 16:53:07 Before runSummarize 2016-01-22 16:53:07

**STARTING SUMMARY

Writing sequences Total classifiable probability mass is 0 Before runKrona 2016-01-22 16:53:07 Generating krona After runKrona 2016-01-22 16:53:07 Debug lvl : 1 After runBlast 2016-01-22 16:53:07 MODE :: all

dong5600 commented 8 years ago

BTW, my command was: ./phylosift all --isolate ACF_bin1.fa

gjospin commented 8 years ago

Did you index the database before trying to run phylosift?

Phylosift index --debug

Should do the trick. If you edited your phylosiftrc file to tell PS where to look for the database then you might need to also add the following flag --config new_phylosiftrc

This line is suspect because it should end with list.txt and not listtxt Using a marker list file /home/a-m/dong5600/share/phylosift/markers/marker_listtxt Not sure what happened there.

I'd try the indexing first. It usually is done automatically when downloading the db but since that didn't work then I would think it didn't happen.

Sent from my iPhone

On Jan 22, 2016, at 3:23 PM, dong5600 notifications@github.com wrote:

Hello,

I was able to successfully run Phylosift on my desktop Since it is too slow (at least 3-5 hrs/per sample) to make my work of >50 samples feasible, I have tried to install the program on our institute's server, which has not gotten positive results

After unzip the phylosift file on the server, I did a test run for the data I had success for my desktop version, but it gave error information and the marker database was not downloaded in the first run I searched online, and followed the protocol in an earlier report (https://groupsgooglecom/forum/#!topic/phylosift/6DkF-rzKbdw) manually downloaded the databases and uncompress them But still could not get progress

My error message and the --debug information are listed as below Any help is highly appreciated!

Results for the taxasummarytxt:

Taxon_ID Taxon_Rank Taxon_Name Probability_Mass

Unclassifiable Unknown Unknown 0

Error information:

PhyloSift -- Phylogenetic analysis of genomes and metagenomes (c) 2011, 2012 Aaron Darling and Guillaume Jospin

CITATION: PhyloSift A E Darling, G Jospin, E Lowe, F A Matsen, H M Bik, J A Eisen Submitted to PeerJ

PhyloSift incorporates several other software packages, please consider also citing the following papers:

pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree
Frederick A Matsen, Robin B Kodner, and E Virginia Armbrust
BMC Bioinformatics 2010, 11:538

Adaptive seeds tame genomic sequence comparison
SM Kielbasa, R Wan, K Sato, P Horton, MC Frith
Genome Research 2011

Infernal 10: Inference of RNA alignments
E P Nawrocki, D L Kolbe, and S R Eddy
Bioinformatics 25:1335-1337 (2009)

Bowtie: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
Langmead B, Trapnell C, Pop M, Salzberg SL Genome Biol 10:R25

HMMER 30 (March 2010); http://hmmerorg/
Copyright (C) 2010 Howard Hughes Medical Institute
Freely distributed under the GNU General Public License (GPLv3)

Phylogenetic Diversity within Seconds
Bui Quang Minh, Steffen Klaere and Arndt von Haeseler
Syst Biol (2006) 55 (5): 769-773

rm: cannot remove `/home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v101/PS_temp/ACF_bin1fa/blastDir/aa1': No such file or directory

Debug info:

All systems are good to go, continuing the screening deleting an old run /home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v101/PS_temp/ACF_bin1fa MODE : all Using updated markers Using a marker list file /home/a-m/dong5600/share/phylosift/markers/marker_listtxt Before runBlast 2016-01-22 16:53:07 USING 0 Input type is dna, fasta Making fifos Launching search process 1 Running /home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v101/bin/lastal -F15 -e75 -f0 /home/a-m/dong5600/share/phylosift/markers/replast "/home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v101/PS_temp/ACF_bin1fa/blastDir/last_0pipe" |Opening /home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v101/PS_temp/ACF_bin1fa/blastDir/readsfasta1 Octopus is handing out sequences Octopus handed out 172 sequences Writing candidates from process 1 ReadsFile: ACF_bin1fa lastal Got 0 markers with hits lastal Got 0 nucleotide markers with hits After runBlast 2016-01-22 16:53:07 Before runAlign 2016-01-22 16:53:07 after marker prep AFTER ALIGN and MASK Using a marker list file /home/a-m/dong5600/share/phylosift/markers/marker_listtxt Using a marker list file /home/a-m/dong5600/share/phylosift/markers/marker_listtxt AFTER concatenateALI After runAlign 2016-01-22 16:53:07 Before runPplacer 2016-01-22 16:53:07 After runPplacer 2016-01-22 16:53:07 Before runSummarize 2016-01-22 16:53:07

**STARTING SUMMARY

Writing sequences Total classifiable probability mass is 0 Before runKrona 2016-01-22 16:53:07 Generating krona After runKrona 2016-01-22 16:53:07 Debug lvl : 1 After runBlast 2016-01-22 16:53:07 MODE :: all

— Reply to this email directly or view it on GitHub.

dong5600 commented 8 years ago

Thank you for reply.

I have not index the database yet. When reading other posts, I also noticed that the automatically loaded database was stored in a folder named .XXX/shared/phylosift (/home/a-m/dong5600/share/phylosift/ for my case). I am wondering whether I should directly download the marker files under this path, which is different from where my phylosift was installed on the server.

I will try the index command too.

I also have another question, since I have >50 samples, can I ran multiple samples at the same time? I tried 2 on my desktop, but it did not work. Please advise!

Thank you and will update the status!

gjospin commented 8 years ago

If you have a lot of memory at your disposal you can run multiple instances at once. Keep in mind that in order to run efficiently fast you might need around 24gigs of ram per instance. That is because of the pplacer step. The search step can be run on multiple CPUs but that's the only section of the pipeline that can do that. If you have less memory pplacer is engineered to use temporary files written to the disk to be able to operate. The IO becomes the limiting step at that point.

We have the luxury of having a computer cluster so I launch 1 phylosift instance per machine using as many cpus as the machines will allow. So I can get through about 20-50 samples per day depending on the availability of the cluster.

I hope this helps.

Sent from my iPhone

On Jan 22, 2016, at 6:31 PM, dong5600 notifications@github.com wrote:

Thank you for reply.

I have not index the database yet. When reading other posts, I also noticed that the automatically loaded database was stored in a folder named .XXX/shared/phylosift (/home/a-m/dong5600/share/phylosift/ for my case). I am wondering whether I should directly download the marker files under this path, which is different from where my phylosift was installed on the server.

I will try the index command too.

I also have another question, since I have >50 samples, can I ran multiple samples at the same time? I tried 2 on my desktop, but it did not work. Please advise!

Thank you and will update the status!

— Reply to this email directly or view it on GitHub.

dong5600 commented 8 years ago

Sounds good. Let me figure out the index issue first. Will update the status. Thank you!

dong5600 commented 8 years ago

Thank you very much for the suggestion, the program worked.

To follow up the question to run batch samples, could you suggest how to write command including multiple files? I did not find related information in the tutorial. An alternative ways was to write individual scripts and submit in batch. Please advice.

Thanks a lot!

gjospin commented 8 years ago

I write an external wrapper script that writes and executes job scripts for each of file that in my sample pool.

On Sat, Jan 23, 2016 at 1:28 PM, dong5600 notifications@github.com wrote:

Thank you very much for the suggestion, the program worked.

To follow up the question to run batch samples, could you suggest how to write command including multiple files? I did not find related information in the tutorial. An alternative ways was to write individual scripts and submit in batch. Please advice.

Thanks a lot!

— Reply to this email directly or view it on GitHub https://github.com/gjospin/PhyloSift/issues/475#issuecomment-174224110.