bowmanjeffs / paprica

paprica - PAthway PRediction by phylogenetIC plAcement
26 stars 8 forks source link

error in ./paprica-run.sh test.bacteria bacteria #40

Closed chassenr closed 7 years ago

chassenr commented 7 years ago

Hi, I am trying to run PAPRICA on a linux server. I have installed all the dependencies, but when I try to run ./paprica-run.sh test.bacteria bacteria I get the following error:

pplacer: unknown option `-o'.
Unknown guppy command: to_csv

Do you know if this error is because of a bug in PAPRICA or because my system was not set up correctly?

Thank you!

Cheers, Christiane

bowmanjeffs commented 7 years ago

Christiane, Did you recently download via git clone, or did you download via last stable release? If the former I'm currently tweaking some things and probably broke the paprica-place_it.py script. Probably easy fix, so let me know...

Jeff

chassenr commented 7 years ago

Hi Jeff, I used git clone. Should I rather download the version paprica_v0.21, which if I didn't miss anything is listed as the last stable release? Are instruction on how to compile the source code included in the release?

Christiane

bowmanjeffs commented 7 years ago

Christiane, I need to update the docs... last stable release is 0.3.1d https://github.com/bowmanjeffs/paprica/releases/latest. No compiling required, all just python/bash scripts. Let me know if you have any trouble!

Cheers, Jeff

chassenr commented 7 years ago

Hi Jeff, sorry to bother you again, but I get the same error with the version of paprica that you specified. Here is the full error message:

paprica-run.sh test.bacteria bacteria
# cmalign :: align sequences to a CM
# INFERNAL 1.1.1 (July 2014)
# Copyright (C) 2014 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# CM file:                                     /opt/bio/dset/paprica_v0.3.1d/models/bacteria_ssu.cm
# sequence file:                               /scratch2/chassenr/MetabolicInference/Paprica/test.bacteria.clean.fasta
# CM name:                                     SSU_rRNA_bacteria
# saving alignment to file:                    /scratch2/chassenr/MetabolicInference/Paprica/test.bacteria.clean.align.sto
# output alignment format specified as:        Pfam
# output alignment alphabet:                   DNA
# number of worker threads:                    40
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#                                                                                running time (s)                 
#                                                                         -------------------------------          
# idx  seq name        length  cm from    cm to  trunc    bit sc  avg pp  band calc  alignment      total  mem (Mb)
# ---  --------------  ------  -------  -------  -----  --------  ------  ---------  ---------  ---------  --------
    1  SRR953432.50        84      968     1054     3'     39.93   0.974       0.05       0.22       0.27      2.91
    2  SRR953432.90        84      968     1054     3'     39.93   0.974       0.05       0.19       0.24      2.91
    3  SRR953432.97        84      968     1054     3'     39.93   0.974       0.05       0.22       0.27      2.91
    4  SRR953432.307       84      968     1054     3'     39.93   0.974       0.06       0.23       0.29      2.91
    5  SRR953432.386       84      968     1054     3'     39.93   0.974       0.05       0.18       0.23      2.91
    6  SRR953432.641       84      968     1054     3'     39.93   0.974       0.07       0.21       0.28      2.91
    7  SRR953432.836       84      968     1054     3'     39.93   0.974       0.07       0.21       0.28      2.91
    8  SRR953432.860       84      968     1054     3'     39.93   0.974       0.07       0.21       0.28      2.91
    9  SRR953432.903       84      968     1054     3'     39.93   0.974       0.07       0.22       0.29      2.91
   10  SRR953432.1226      84      968     1054     3'     39.93   0.974       0.07       0.22       0.29      2.91
   11  SRR953432.1336      84      968     1054     3'     39.93   0.974       0.07       0.22       0.29      2.91
   12  SRR953432.1341      84      968     1054     3'     39.93   0.974       0.06       0.23       0.29      2.91
   13  SRR953432.1426      84      968     1054     3'     39.93   0.974       0.07       0.22       0.29      2.91
   14  SRR953432.1527      84      968     1054     3'     39.93   0.974       0.07       0.21       0.28      2.91
   15  SRR953432.1647      84      968     1054     3'     39.93   0.974       0.07       0.22       0.29      2.91
   16  SRR953432.2076      84      968     1054     3'     39.93   0.974       0.05       0.17       0.22      2.91
   17  SRR953432.2160      84      968     1054     3'     39.93   0.974       0.07       0.22       0.29      2.91
   18  SRR953432.2164      84      968     1054     3'     39.93   0.974       0.07       0.22       0.29      2.91
   19  SRR953432.2192      84      968     1054     3'     39.93   0.974       0.07       0.20       0.27      2.91
   20  SRR953432.2374      84      968     1054     3'     39.93   0.974       0.07       0.22       0.29      2.91
   21  SRR953432.2713      84      968     1054     3'     39.93   0.974       0.07       0.22       0.29      2.91
   22  SRR953432.2934      84      968     1054     3'     39.93   0.974       0.07       0.20       0.27      2.91
   23  SRR953432.3261      84      968     1054     3'     39.93   0.974       0.05       0.24       0.29      2.91
   24  SRR953432.3998      84      968     1054     3'     39.93   0.974       0.07       0.22       0.29      2.91
   25  SRR953432.4090      84      968     1054     3'     39.93   0.974       0.05       0.18       0.23      2.91
   26  SRR953432.4248      84      968     1054     3'     39.93   0.974       0.07       0.21       0.28      2.91
   27  SRR953432.4378      84      968     1054     3'     39.93   0.974       0.05       0.23       0.28      2.91
   28  SRR953432.4573      84      968     1054     3'     39.93   0.974       0.07       0.22       0.29      2.91
#
# CPU time: 4.43u 0.47s 00:00:04.89 Elapsed: 00:00:00.87
# Saving alignment to file /scratch2/chassenr/MetabolicInference/Paprica/test.bacteria.combined_16S.bacteria.tax.clean.align.sto ... done
#
# CPU time: 0.06u 0.01s 00:00:00.06 Elapsed: 00:00:00.07
pplacer: unknown option `-o'.
pplacer [options] [alignment]
  -c Specify the path to the reference package.
  -t Specify the reference tree filename.
  -r Specify the reference alignment filename.
  -s Supply a phyml stats.txt or a RAxML info file giving the model parameters.
  -d Specify the directory containing the reference information.
  -p Calculate posterior probabilities.
  -m Substitution model. Protein: are LG, WAG, or JTT. Nucleotides: GTR.
  --model-freqs Use model frequencies instead of reference alignment frequencies.
  --gamma-cats Number of categories for discrete gamma model.
  --gamma-alpha Specify the shape parameter for a discrete gamma model.
  --ml-tolerance 1st stage branch len optimization tolerance (2nd stage to 1e-5). Default: 0.01.
  --pp-rel-err Relative error for the posterior probability calculation. Default is 0.01.
  --unif-prior Use a uniform prior rather than exponential.
  --inform-prior Use an informative exponential prior based on rooted distance to leaves.
  --prior-lower Lower bound for the informative prior mean. Default is 0.
  --start-pend Starting pendant branch length. Default is 0.1.
  --max-pend Set the maximum ML pendant branch length. Default is 2.
  --fig-cutoff The cutoff for determining figs. Default is 0; specify 0 to disable.
  --fig-eval-all Evaluate all likelihoods to ensure that the best location was selected.
  --fig-eval-discrepancy-tree Write out a tree showing the discrepancies between the best complete and observed locations.
  --fig-tree Write out a tree showing the figs on the tree.
  --max-strikes Maximum number of strikes for baseball. 0 -> no ball playing. Default is 6.
  --strike-box Set the size of the strike box in log likelihood units. Default is 3.
  --max-pitches Set the maximum number of pitches for baseball. Default is 40.
  --fantasy Desired likelihood cutoff for fantasy baseball mode. 0 -> no fantasy.
  --fantasy-frac Fraction of fragments to use when running fantasy baseball. Default is 0.1.
  --write-masked Write alignment masked to the region without gaps in the query.
  --verbosity Set verbosity level. 0 is silent, and 2 is quite a lot. Default is 1.
  --out-dir Specify the directory to write place files to.
  --pretend Only check out the files then report. Do not run the analysis.
  --check-like Write out the likelihood of the reference tree, calculated two ways.
  -j The number of child processes to spawn when doing placements. Default is 2.
  --timing Display timing information after the pplacer run finishes.
  --no-pre-mask Don't pre-mask sequences before placement.
  --write-pre-masked Write out the pre-masked sequences to the specified fasta file and exit.
  --map-mrca Specify a file to write out MAP sequences for MRCAs and corresponding placements.
  --map-mrca-min Specify cutoff for inclusion in MAP sequence file. Default is 0.8.
  --map-identity Add the percent identity of the query sequence to the nearest MAP sequence to each placement.
  --keep-at-most The maximum number of placements we keep. Default is 7.
  --keep-factor Throw away anything that has ml_ratio below keep_factor times (best ml_ratio). Default is 0.01.
  --mrca-class Classify with MRCAs instead of a painted tree.
  --version Write out the version number and exit.
  -help  Display this list of options
  --help  Display this list of options
Unknown guppy command: to_csv
Here is a list of commands available using this interface:
  visualization
    fat                     makes trees with edges fattened in proportion to the number of reads
    heat                    maps an an arbitrary vector of the correct length to the tree
    sing                    makes one tree for each query sequence, showing uncertainty
    tog                     makes a tree with each of the reads represented as a pendant edge

  statistical comparison
    bary                    draws the barycenter of a placement collection on the reference tree
    edpl                    calculates the EDPL uncertainty values for a collection of pqueries
    kr                      calculates the Kantorovich-Rubinstein distance and corresponding p-values
    kr_heat                 makes a heat tree
    pca                     performs edge principal components
    pd                      calculate phylogenetic diversity
    rarefact                calculates phylogenetic rarefaction curves
    splitify                writes out differences of masses for the splits of the tree
    squash                  performs squash clustering
    wpd                     calculate weighted phylogenetic diversity of placefiles

  classification
    classify                outputs classification information in a tabular or SQLite format

  utilities
    compress                compress a placefile's pqueries
    demulti                 splits apart placements with multiplicity, undoing a round procedure
    diplac                  find the most DIstant PLACements from the leaves
    distmat                 prints out a pairwise distance matrix between the edges
    filter                  filters one or more placefiles by placement name
    info                    writes the number of leaves of the reference tree and the number of pqueries
    islands                 find the mass islands of one or more pqueries
    merge                   merges placefiles together
    mft                     Multi-Filter and Transform placefiles
    redup                   restores duplicates to deduped placefiles
    round                   clusters the placements by rounding branch lengths
    to_json                 converts old-style .place files to .jplace placement files

To get more help about a given command, type guppy COMMAND --help
fat: unknown option `--point-mass'.
usage: fat [options] placefile[s]
  -o Specify the filename to write to.
  --out-dir Specify the directory to write files to.
  --prefix Specify a string to be prepended to filenames.
  --unweighted Treat every placement as a point mass concentrated on the highest-weight placement.
  --pp Use posterior probability for the weight.
  -c Reference package path.
  --min-fat The minimum branch length for fattened edges (to increase their visibility). To turn off set to 0. Default: 0.01
  --total-width Set the total pixel width for all of the branches of the tree. Default: 300
  --width-factor Override total-width by directly setting the number of pixels per unit of thing displayed.
  --node-numbers Put the node numbers in where the bootstraps usually go.
  --average Average all input placefiles together.
  -help  Display this list of options
  --help  Display this list of options
Traceback (most recent call last):
  File "/opt/bio/dset/paprica_v0.3.1d/paprica-tally_pathways.py", line 145, in <module>
    query_csv = pd.DataFrame.from_csv(cwd + query, header = 0)
  File "/opt/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 1138, in from_csv
    infer_datetime_format=infer_datetime_format)
  File "/opt/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 491, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/opt/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 268, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/opt/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 583, in __init__
    self._make_engine(self.engine)
  File "/opt/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 724, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/opt/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1093, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 350, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3229)
  File "pandas/parser.pyx", line 583, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:6042)
IOError: File /scratch2/chassenr/MetabolicInference/Paprica/test.bacteria.combined_16S.bacteria.tax.clean.align.csv does not exist
bowmanjeffs commented 7 years ago

For some reason pplacer isn't recognizing the -o flag. Is this a new installation of pplacer, or do you have an old version? Check that you have the most recent version of pplacer and guppy.

chassenr commented 7 years ago

I am using pplacer version v1.1.alpha10, which (although it was installed this year) seems to be quite old. If you think that this version of pplacer is too old, I will check with our IT if they really need that old version or thy would be willing to update it. It seems that on our system guppy is part of the pplacer distribution.

bowmanjeffs commented 7 years ago

I think this is the problem. Older versions of guppy don't have the to_csv output option, and apparently older versions of pplacer don't recognize the -o flag. I suggest updating both. You can probably update locally even if your administrators are reluctant to update the system version. In the meantime, if you just want to check out paprica to see if it will work for you, note that a virtual box appliance and Amazon ECS machine instance are available. Let me know if you have any issues with either, or if updating pplacer/guppy doesn't solve the problem!