gjospin / PhyloSift

Phylogenetic and taxonomic analysis for genomes and metagenomes
82 stars 18 forks source link

Why is it croaking? #149

Closed dbrami closed 12 years ago

dbrami commented 12 years ago

Hi,

I had successfully started the pipeline with the following command: DATA="/home/dbrami/tmp/metAmos" /tmp/PhyloSift/phylosift_20120412/bin/phylosift all --output=results --threaded=20 --debug --paired ${DATA}/SGI_BS27.1.fastq ${DATA}/SGI_BS27.2.fastq 1>PhyloSoft.out 2>PhyloSoft.err &

This created the following structure: -- results |-- alignDir |-- blastDir -- treeDir

and I believe the blast searches in blastDir have completed successfully. The pipeline the proceeded to create PMPROK00003.tmpout.fifo inside de alignDir folder with following info: 0 prw-rw-r-- 1 dbrami employees 0 Apr 13 16:23 results/alignDir/PMPROK00003.tmpout.fifo

but the pipeline has been hanging for 48 with no cpu activity.

I therefore resterated pipeline with following command:

DATA="/home/dbrami/tmp/metAmos" /tmp/PhyloSift/phylosift_20120412/bin/phylosift --continue --output=results --threaded=20 --debug --paired ${DATA}/SGI_BS27.1.fastq ${DATA}/SGI_BS27.2.fastq 1>PhyloSift2.out 2>PhyloSift2.err &

and got this error: (usage) ...

at /tmp/PhyloSift/phylosift_20120412/bin/phylosift line 119

Any suggestions on how to push it through?

gjospin commented 12 years ago

/tmp/PhyloSift/phylosift_20120412/bin/phylosift --continue --output=results --threaded=20 --debug --paired ${DATA}/SGI_BS27.1.fastq ${DATA}/SGI_BS27.2.fastq 1>PhyloSift2.out 2>PhyloSift2.err &

There is no mode selected (between phylosift) and --continue You can try : /tmp/PhyloSift/phylosift_20120412/bin/phylosift align --continue --output=results --threaded=20 --debug --paired ${DATA}/SGI_BS27.1.fastq ${DATA}/SGI_BS27.2.fastq 1>PhyloSift2.out 2>PhyloSift2.err &

gjospin commented 12 years ago

Actually depending on the size of your data and the available memory on your machine, it is possible that the machine ran out of memory.

This is a known issue that we have developed a fix for but it has not yet been pushed to the released archive. We are in the process of testing that feature.

dbrami commented 12 years ago

Your command seems to have kickstarted it; but the much awaited treeDir is still empty. Here are the logs:

cmd-> cat PhyloSift2.out align /home/dbrami/tmp/metAmos/SGI_BS27.1.fastq /home/dbrami/tmp/metAmos/SGI_BS27.2.fastq PAIR : 1 INSIDE paired READSFILE /home/dbrami/tmp/metAmos/SGI_BS27.1.fastq FORCE: 0 Continue : 1 force : 0 PhyloSift -- Phylogenetic analysis of genomes and metagenomes (c) 2011, 2012 Aaron Darling and Guillaume Jospin

CITATION: PhyloSift. A. Darling, H. Bik, G. Jospin, J. A. Eisen. Manuscript in preparation

PhyloSift incorporates several other software packages, please consider also citing the following papers:

            pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree.
            Frederick A Matsen, Robin B Kodner, and E Virginia Armbrust
            BMC Bioinformatics 2010, 11:538

            Adaptive seeds tame genomic sequence comparison.
            SM Kielbasa, R Wan, K Sato, P Horton, MC Frith
            Genome Research 2011.

            Infernal 1.0: Inference of RNA alignments
            E. P. Nawrocki, D. L. Kolbe, and S. R. Eddy
            Bioinformatics 25:1335-1337 (2009)

            Bowtie: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.
            Langmead B, Trapnell C, Pop M, Salzberg SL. Genome Biol 10:R25.

            HMMER 3.0 (March 2010); http://hmmer.org/
            Copyright (C) 2010 Howard Hughes Medical Institute.
            Freely distributed under the GNU General Public License (GPLv3).

            Phylogenetic Diversity within Seconds.
            Bui Quang Minh, Steffen Klaere and Arndt von Haeseler
            Syst Biol (2006) 55 (5): 769-773.

Before START 2012-04-16 11:18:32 All systems are good to go, continuing the screening MARKER_PATH : /home/dbrami/share/phylosift/markers TEST LOCAL :Fri Apr 13 14:01:45 2012 Using updated markers CUSTOM = MODE :: align MODE :: align Before Alignments 2012-04-16 11:18:34 beforeDirprepClean PMPROK00003 PMPROK00014 PMPROK00015 PMPROK00019 PMPROK00020 PMPROK00022 PMPROK00024 PMPROK00025 PMPROK00028 PMPROK00029 PMPROK00031 PMPROK00034 PMPROK00041 PMPROK00047 PMPROK00048 PMPROK00050 PMPROK00051 PMPROK00052 PMPROK00053 PMPROK00054 PMPROK00060 PMPROK00064 PMPROK00067 PMPROK00068 PMPROK00069 PMPROK00071 PMPROK00074 PMPROK00075 PMPROK00081 PMPROK00086 PMPROK00087 PMPROK00092 PMPROK00093 PMPROK00094 PMPROK00097 PMPROK00106 PMPROK00123 PMPROK00126 18s_reps 16s_reps_arc 16s_reps_bac AFTERdirprepclean PMPROK00003 PMPROK00014 PMPROK00015 PMPROK00019 PMPROK00020 PMPROK00022 PMPROK00024 PMPROK00025 PMPROK00028 PMPROK00029 PMPROK00031 PMPROK00034 PMPROK00041 PMPROK00047 PMPROK00048 PMPROK00050 PMPROK00051 PMPROK00052 PMPROK00053 PMPROK00054 PMPROK00060 PMPROK00064 PMPROK00067 PMPROK00068 PMPROK00069 PMPROK00071 PMPROK00074 PMPROK00075 PMPROK00081 PMPROK00086 PMPROK00087 PMPROK00092 PMPROK00093 PMPROK00094 PMPROK00097 PMPROK00106 PMPROK00123 PMPROK00126 18s_reps 16s_reps_arc 16s_reps_bac ALIGNDIR : results/alignDir after HMMSEARCH PARSE Setting up cmalign for marker 18s_reps Setting up cmalign for marker 16s_reps_arc Setting up cmalign for marker 16s_reps_bac AFTER ALIGN and MASK AFTER concatenateALI After Alignments 2012-04-16 11:18:34 PPLACER MARKS PMPROK00003 PMPROK00014 PMPROK00015 PMPROK00019 PMPROK00020 PMPROK00022 PMPROK00024 PMPROK00025 PMPROK00028 PMPROK00029 PMPROK00031 PMPROK00034 PMPROK00041 PMPROK00047 PMPROK00048 PMPROK00050 PMPROK00051 PMPROK00052 PMPROK00053 PMPROK00054 PMPROK00060 PMPROK00064 PMPROK00067 PMPROK00068 PMPROK00069 PMPROK00071 PMPROK00074 PMPROK00075 PMPROK00081 PMPROK00086 PMPROK00087 PMPROK00092 PMPROK00093 PMPROK00094 PMPROK00097 PMPROK00106 PMPROK00123 PMPROK00126 18s_reps 16s_reps_arc 16s_reps_bac Before runPPlacer 2012-04-16 11:18:34 Running /bioinformatics/asm/bio_bin/Amphora2/amphora2_2012029/bin/pplacer --groups 10 --verbosity 0 -j 20 -c /home/dbrami/share/phylosift/markers/concat.updated results/alignDir/concat.trim.fasta After runPPlacer 2012-04-16 11:18:34 Before taxonomy assignments 2012-04-16 11:18:34 Reading NCBI taxonomy Total reads are 0 Generating krona init xml parse ncbi visitor Root node id 1 Root node read count done visiting! After taxonomy assignments 2012-04-16 11:21:14


cmd-> cat PhyloSift2.err Warning : a different version of HMMER was found. PhyloSift was tested with HMMER 3.0rc1 at /tmp/PhyloSift/phylosift_20120412/bin/../lib/Phylosift/Phylosift.pm line 274 /bioinformatics/asm/bio_bin/Amphora2/amphora2_2012029/bin/pplacer: unknown option `--groups'. pplacer [options] [alignment] -c Specify the path to the reference package. -t Specify the reference tree filename. -r Specify the reference alignment filename. -s Supply a phyml stats.txt or a RAxML info file giving the model parameters. -d Specify the directory containing the reference information. -p Calculate posterior probabilities. -m Substitution model. Protein: are LG, WAG, or JTT. Nucleotides: GTR. --model-freqs Use model frequencies instead of reference alignment frequencies. --gamma-cats Number of categories for discrete gamma model. --gamma-alpha Specify the shape parameter for a discrete gamma model. --ml-tolerance 1st stage branch len optimization tolerance (2nd stage to 1e-5). Default: 0.01. --pp-rel-err Relative error for the posterior probability calculation. Default is 0.01. --unif-prior Use a uniform prior rather than exponential. --inform-prior Use an informative exponential prior based on rooted distance to leaves. --prior-lower Lower bound for the informative prior mean. Default is 0. --start-pend Starting pendant branch length. Default is 0.1. --max-pend Set the maximum ML pendant branch length. Default is 2. --fig-cutoff The cutoff for determining figs. Default is 0; specify 0 to disable. --fig-eval-all Evaluate all likelihoods to ensure that the best location was selected. --fig-eval-discrepancy-tree Write out a tree showing the discrepancies between the best complete and observed locations. --fig-tree Write out a tree showing the figs on the tree. --max-strikes Maximum number of strikes for baseball. 0 -> no ball playing. Default is 6. --strike-box Set the size of the strike box in log likelihood units. Default is 3. --max-pitches Set the maximum number of pitches for baseball. Default is 40. --fantasy Desired likelihood cutoff for fantasy baseball mode. 0 -> no fantasy. --fantasy-frac Fraction of fragments to use when running fantasy baseball. Default is 0.1. --write-masked Write alignment masked to the region without gaps in the query. --verbosity Set verbosity level. 0 is silent, and 2 is quite a lot. Default is 1. --out-dir Specify the directory to write place files to. --pretend Only check out the files then report. Do not run the analysis. --check-like Write out the likelihood of the reference tree, calculated two ways. -j The number of child processes to spawn when doing placements. Default is 2. --timing Display timing information after the pplacer run finishes. --no-pre-mask Don't pre-mask sequences before placement. --write-pre-masked Write out the pre-masked sequences to the specified fasta file and exit. --map-mrca Specify a file to write out MAP sequences for MRCAs and corresponding placements. --map-mrca-min Specify cutoff for inclusion in MAP sequence file. Default is 0.8. --map-identity Add the percent identity of the query sequence to the nearest MAP sequence to each placement. --keep-at-most The maximum number of placements we keep. Default is 7. --keep-factor Throw away anything that has ml_ratio below keep_factor times (best ml_ratio). Default is 0.01. --mrca-class Classify with MRCAs instead of a painted tree. --version Write out the version number and exit. -help Display this list of options --help Display this list of options Use of uninitialized value in concatenation (.) or string at /tmp/PhyloSift/phylosift_20120412/bin/../lib/Phylosift/Summarize.pm line 425, line 1.

dbrami commented 12 years ago

I just realized that I must have run the command from one directory level too high; Running it again from proper directory causes program to remain in memory without doing anything:

dbrami 14741 0.1 0.0 139608 31388 pts/4 S 11:36 0:00 perl /tmp/PhyloSift/phylosift_20120412/bin/phylosift align --continue --output=results --threaded=20 --debug --paired /home/dbrami/tmp/metAmos/SGI_BS27.1.fastq /home/dbrami/tmp/metAmos/SGI_BS27.2.fastq

Here are the logs again:

dbrami@asm04.c01:/home/dbrami/tmp/PhyloSift cmd-> cat PhyloSift2.err Warning : a different version of HMMER was found. PhyloSift was tested with HMMER 3.0rc1 at /tmp/PhyloSift/phylosift_20120412/bin/../lib/Phylosift/Phylosift.pm line 274

FATAL: No such option "--max". Usage: hmmsearch [-options] Available options are: -h : help; print brief help on version and usage -A : sets alignment output limit to best domain alignments -E : sets E value cutoff (globE) to <= x -T : sets T bit threshold (globT) to >= x -Z : sets Z (# seqs) for E-value calculation


dbrami@asm04.c01:/home/dbrami/tmp/PhyloSift cmd-> cat PhyloSift2.out align /home/dbrami/tmp/metAmos/SGI_BS27.1.fastq /home/dbrami/tmp/metAmos/SGI_BS27.2.fastq PAIR : 1 INSIDE paired READSFILE /home/dbrami/tmp/metAmos/SGI_BS27.1.fastq FORCE: 0 Continue : 1 force : 0 PhyloSift -- Phylogenetic analysis of genomes and metagenomes (c) 2011, 2012 Aaron Darling and Guillaume Jospin

CITATION: PhyloSift. A. Darling, H. Bik, G. Jospin, J. A. Eisen. Manuscript in preparation

PhyloSift incorporates several other software packages, please consider also citing the following papers:

            pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree.
            Frederick A Matsen, Robin B Kodner, and E Virginia Armbrust
            BMC Bioinformatics 2010, 11:538

            Adaptive seeds tame genomic sequence comparison.
            SM Kielbasa, R Wan, K Sato, P Horton, MC Frith
            Genome Research 2011.

            Infernal 1.0: Inference of RNA alignments
            E. P. Nawrocki, D. L. Kolbe, and S. R. Eddy
            Bioinformatics 25:1335-1337 (2009)

            Bowtie: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.
            Langmead B, Trapnell C, Pop M, Salzberg SL. Genome Biol 10:R25.

            HMMER 3.0 (March 2010); http://hmmer.org/
            Copyright (C) 2010 Howard Hughes Medical Institute.
            Freely distributed under the GNU General Public License (GPLv3).

            Phylogenetic Diversity within Seconds.
            Bui Quang Minh, Steffen Klaere and Arndt von Haeseler
            Syst Biol (2006) 55 (5): 769-773.

Before START 2012-04-16 11:36:14 All systems are good to go, continuing the screening MARKER_PATH : /home/dbrami/share/phylosift/markers TEST LOCAL :Fri Apr 13 14:01:45 2012 Using updated markers CUSTOM = MODE :: align MODE :: align Before Alignments 2012-04-16 11:36:15 beforeDirprepClean PMPROK00003 PMPROK00014 PMPROK00015 PMPROK00019 PMPROK00020 PMPROK00022 PMPROK00024 PMPROK00025 PMPROK00028 PMPROK00029 PMPROK00031 PMPROK00034 PMPROK00041 PMPROK00047 PMPROK00048 PMPROK00050 PMPROK00051 PMPROK00052 PMPROK00053 PMPROK00054 PMPROK00060 PMPROK00064 PMPROK00067 PMPROK00068 PMPROK00069 PMPROK00071 PMPROK00074 PMPROK00075 PMPROK00081 PMPROK00086 PMPROK00087 PMPROK00092 PMPROK00093 PMPROK00094 PMPROK00097 PMPROK00106 PMPROK00123 PMPROK00126 18s_reps 16s_reps_arc 16s_reps_bac AFTERdirprepclean PMPROK00003 PMPROK00014 PMPROK00015 PMPROK00019 PMPROK00020 PMPROK00022 PMPROK00024 PMPROK00025 PMPROK00028 PMPROK00029 PMPROK00031 PMPROK00034 PMPROK00041 PMPROK00047 PMPROK00048 PMPROK00050 PMPROK00051 PMPROK00052 PMPROK00053 PMPROK00054 PMPROK00060 PMPROK00064 PMPROK00067 PMPROK00068 PMPROK00069 PMPROK00071 PMPROK00074 PMPROK00075 PMPROK00081 PMPROK00086 PMPROK00087 PMPROK00092 PMPROK00093 PMPROK00094 PMPROK00097 PMPROK00106 PMPROK00123 PMPROK00126 18s_reps 16s_reps_arc 16s_reps_bac ALIGNDIR : results/alignDir

koadman commented 12 years ago

Hi dbrami:

In the error logs it looks like PhyloSift has been picking up the wrong versions of HMMER and pplacer in your runs. PhyloSift is packaged with the version it requires, and probably you have other versions of these programs installed on your machine that are taking precedence over the versions packaged with PhyloSift. A short term workaround for you might be to add the phylosift "bin" directory to your path like this:

export PATH="/tmp/PhyloSift/phylosift_20120412/bin/:$PATH"

we will look into a better long-term fix for this issue.

Note, there may be other problems, these are just the first and most obvious problems your runs have been experiencing. I would suggest restarting the run from the beginning once you've changed the path ("phylosift all ...").

dbrami commented 12 years ago

Any idea why it s still failing or what this means?

cmd-> cat ../PhyloSoft.err Fatal error: exception Failure("hd") Use of uninitialized value in concatenation (.) or string at /tmp/PhyloSift/phylosift_20120412/bin/../lib/Phylosift/Summarize.pm line 425, line 1.

koadman commented 12 years ago

somehow we missed this when you reopened it again. The first error message is generated by pplacer when it fails to place reads. it would be helpful to see some context, such as the stdout produced by --debug. The 2nd and 3rd lines are generated by phylosift and could possibly be a result from one of your reads mapping to an NCBI taxon ID that NCBI has removed/merged from their taxonomy. The master branch has a known bug where it does not re-map merged/deleted NCBI IDs during the summary process, this was fixed in devel which awaits a few more fixes before we push it over to master.

koadman commented 12 years ago

closing because apparently this has been superseded by a newer issue. please reopen if still relevant.