Closed chassenr closed 7 years ago
Christiane, Did you recently download via git clone, or did you download via last stable release? If the former I'm currently tweaking some things and probably broke the paprica-place_it.py script. Probably easy fix, so let me know...
Jeff
Hi Jeff, I used git clone. Should I rather download the version paprica_v0.21, which if I didn't miss anything is listed as the last stable release? Are instruction on how to compile the source code included in the release?
Christiane
Christiane, I need to update the docs... last stable release is 0.3.1d https://github.com/bowmanjeffs/paprica/releases/latest. No compiling required, all just python/bash scripts. Let me know if you have any trouble!
Cheers, Jeff
Hi Jeff, sorry to bother you again, but I get the same error with the version of paprica that you specified. Here is the full error message:
paprica-run.sh test.bacteria bacteria
# cmalign :: align sequences to a CM
# INFERNAL 1.1.1 (July 2014)
# Copyright (C) 2014 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# CM file: /opt/bio/dset/paprica_v0.3.1d/models/bacteria_ssu.cm
# sequence file: /scratch2/chassenr/MetabolicInference/Paprica/test.bacteria.clean.fasta
# CM name: SSU_rRNA_bacteria
# saving alignment to file: /scratch2/chassenr/MetabolicInference/Paprica/test.bacteria.clean.align.sto
# output alignment format specified as: Pfam
# output alignment alphabet: DNA
# number of worker threads: 40
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# running time (s)
# -------------------------------
# idx seq name length cm from cm to trunc bit sc avg pp band calc alignment total mem (Mb)
# --- -------------- ------ ------- ------- ----- -------- ------ --------- --------- --------- --------
1 SRR953432.50 84 968 1054 3' 39.93 0.974 0.05 0.22 0.27 2.91
2 SRR953432.90 84 968 1054 3' 39.93 0.974 0.05 0.19 0.24 2.91
3 SRR953432.97 84 968 1054 3' 39.93 0.974 0.05 0.22 0.27 2.91
4 SRR953432.307 84 968 1054 3' 39.93 0.974 0.06 0.23 0.29 2.91
5 SRR953432.386 84 968 1054 3' 39.93 0.974 0.05 0.18 0.23 2.91
6 SRR953432.641 84 968 1054 3' 39.93 0.974 0.07 0.21 0.28 2.91
7 SRR953432.836 84 968 1054 3' 39.93 0.974 0.07 0.21 0.28 2.91
8 SRR953432.860 84 968 1054 3' 39.93 0.974 0.07 0.21 0.28 2.91
9 SRR953432.903 84 968 1054 3' 39.93 0.974 0.07 0.22 0.29 2.91
10 SRR953432.1226 84 968 1054 3' 39.93 0.974 0.07 0.22 0.29 2.91
11 SRR953432.1336 84 968 1054 3' 39.93 0.974 0.07 0.22 0.29 2.91
12 SRR953432.1341 84 968 1054 3' 39.93 0.974 0.06 0.23 0.29 2.91
13 SRR953432.1426 84 968 1054 3' 39.93 0.974 0.07 0.22 0.29 2.91
14 SRR953432.1527 84 968 1054 3' 39.93 0.974 0.07 0.21 0.28 2.91
15 SRR953432.1647 84 968 1054 3' 39.93 0.974 0.07 0.22 0.29 2.91
16 SRR953432.2076 84 968 1054 3' 39.93 0.974 0.05 0.17 0.22 2.91
17 SRR953432.2160 84 968 1054 3' 39.93 0.974 0.07 0.22 0.29 2.91
18 SRR953432.2164 84 968 1054 3' 39.93 0.974 0.07 0.22 0.29 2.91
19 SRR953432.2192 84 968 1054 3' 39.93 0.974 0.07 0.20 0.27 2.91
20 SRR953432.2374 84 968 1054 3' 39.93 0.974 0.07 0.22 0.29 2.91
21 SRR953432.2713 84 968 1054 3' 39.93 0.974 0.07 0.22 0.29 2.91
22 SRR953432.2934 84 968 1054 3' 39.93 0.974 0.07 0.20 0.27 2.91
23 SRR953432.3261 84 968 1054 3' 39.93 0.974 0.05 0.24 0.29 2.91
24 SRR953432.3998 84 968 1054 3' 39.93 0.974 0.07 0.22 0.29 2.91
25 SRR953432.4090 84 968 1054 3' 39.93 0.974 0.05 0.18 0.23 2.91
26 SRR953432.4248 84 968 1054 3' 39.93 0.974 0.07 0.21 0.28 2.91
27 SRR953432.4378 84 968 1054 3' 39.93 0.974 0.05 0.23 0.28 2.91
28 SRR953432.4573 84 968 1054 3' 39.93 0.974 0.07 0.22 0.29 2.91
#
# CPU time: 4.43u 0.47s 00:00:04.89 Elapsed: 00:00:00.87
# Saving alignment to file /scratch2/chassenr/MetabolicInference/Paprica/test.bacteria.combined_16S.bacteria.tax.clean.align.sto ... done
#
# CPU time: 0.06u 0.01s 00:00:00.06 Elapsed: 00:00:00.07
pplacer: unknown option `-o'.
pplacer [options] [alignment]
-c Specify the path to the reference package.
-t Specify the reference tree filename.
-r Specify the reference alignment filename.
-s Supply a phyml stats.txt or a RAxML info file giving the model parameters.
-d Specify the directory containing the reference information.
-p Calculate posterior probabilities.
-m Substitution model. Protein: are LG, WAG, or JTT. Nucleotides: GTR.
--model-freqs Use model frequencies instead of reference alignment frequencies.
--gamma-cats Number of categories for discrete gamma model.
--gamma-alpha Specify the shape parameter for a discrete gamma model.
--ml-tolerance 1st stage branch len optimization tolerance (2nd stage to 1e-5). Default: 0.01.
--pp-rel-err Relative error for the posterior probability calculation. Default is 0.01.
--unif-prior Use a uniform prior rather than exponential.
--inform-prior Use an informative exponential prior based on rooted distance to leaves.
--prior-lower Lower bound for the informative prior mean. Default is 0.
--start-pend Starting pendant branch length. Default is 0.1.
--max-pend Set the maximum ML pendant branch length. Default is 2.
--fig-cutoff The cutoff for determining figs. Default is 0; specify 0 to disable.
--fig-eval-all Evaluate all likelihoods to ensure that the best location was selected.
--fig-eval-discrepancy-tree Write out a tree showing the discrepancies between the best complete and observed locations.
--fig-tree Write out a tree showing the figs on the tree.
--max-strikes Maximum number of strikes for baseball. 0 -> no ball playing. Default is 6.
--strike-box Set the size of the strike box in log likelihood units. Default is 3.
--max-pitches Set the maximum number of pitches for baseball. Default is 40.
--fantasy Desired likelihood cutoff for fantasy baseball mode. 0 -> no fantasy.
--fantasy-frac Fraction of fragments to use when running fantasy baseball. Default is 0.1.
--write-masked Write alignment masked to the region without gaps in the query.
--verbosity Set verbosity level. 0 is silent, and 2 is quite a lot. Default is 1.
--out-dir Specify the directory to write place files to.
--pretend Only check out the files then report. Do not run the analysis.
--check-like Write out the likelihood of the reference tree, calculated two ways.
-j The number of child processes to spawn when doing placements. Default is 2.
--timing Display timing information after the pplacer run finishes.
--no-pre-mask Don't pre-mask sequences before placement.
--write-pre-masked Write out the pre-masked sequences to the specified fasta file and exit.
--map-mrca Specify a file to write out MAP sequences for MRCAs and corresponding placements.
--map-mrca-min Specify cutoff for inclusion in MAP sequence file. Default is 0.8.
--map-identity Add the percent identity of the query sequence to the nearest MAP sequence to each placement.
--keep-at-most The maximum number of placements we keep. Default is 7.
--keep-factor Throw away anything that has ml_ratio below keep_factor times (best ml_ratio). Default is 0.01.
--mrca-class Classify with MRCAs instead of a painted tree.
--version Write out the version number and exit.
-help Display this list of options
--help Display this list of options
Unknown guppy command: to_csv
Here is a list of commands available using this interface:
visualization
fat makes trees with edges fattened in proportion to the number of reads
heat maps an an arbitrary vector of the correct length to the tree
sing makes one tree for each query sequence, showing uncertainty
tog makes a tree with each of the reads represented as a pendant edge
statistical comparison
bary draws the barycenter of a placement collection on the reference tree
edpl calculates the EDPL uncertainty values for a collection of pqueries
kr calculates the Kantorovich-Rubinstein distance and corresponding p-values
kr_heat makes a heat tree
pca performs edge principal components
pd calculate phylogenetic diversity
rarefact calculates phylogenetic rarefaction curves
splitify writes out differences of masses for the splits of the tree
squash performs squash clustering
wpd calculate weighted phylogenetic diversity of placefiles
classification
classify outputs classification information in a tabular or SQLite format
utilities
compress compress a placefile's pqueries
demulti splits apart placements with multiplicity, undoing a round procedure
diplac find the most DIstant PLACements from the leaves
distmat prints out a pairwise distance matrix between the edges
filter filters one or more placefiles by placement name
info writes the number of leaves of the reference tree and the number of pqueries
islands find the mass islands of one or more pqueries
merge merges placefiles together
mft Multi-Filter and Transform placefiles
redup restores duplicates to deduped placefiles
round clusters the placements by rounding branch lengths
to_json converts old-style .place files to .jplace placement files
To get more help about a given command, type guppy COMMAND --help
fat: unknown option `--point-mass'.
usage: fat [options] placefile[s]
-o Specify the filename to write to.
--out-dir Specify the directory to write files to.
--prefix Specify a string to be prepended to filenames.
--unweighted Treat every placement as a point mass concentrated on the highest-weight placement.
--pp Use posterior probability for the weight.
-c Reference package path.
--min-fat The minimum branch length for fattened edges (to increase their visibility). To turn off set to 0. Default: 0.01
--total-width Set the total pixel width for all of the branches of the tree. Default: 300
--width-factor Override total-width by directly setting the number of pixels per unit of thing displayed.
--node-numbers Put the node numbers in where the bootstraps usually go.
--average Average all input placefiles together.
-help Display this list of options
--help Display this list of options
Traceback (most recent call last):
File "/opt/bio/dset/paprica_v0.3.1d/paprica-tally_pathways.py", line 145, in <module>
query_csv = pd.DataFrame.from_csv(cwd + query, header = 0)
File "/opt/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 1138, in from_csv
infer_datetime_format=infer_datetime_format)
File "/opt/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 491, in parser_f
return _read(filepath_or_buffer, kwds)
File "/opt/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 268, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/opt/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 583, in __init__
self._make_engine(self.engine)
File "/opt/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 724, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/opt/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1093, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 350, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3229)
File "pandas/parser.pyx", line 583, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:6042)
IOError: File /scratch2/chassenr/MetabolicInference/Paprica/test.bacteria.combined_16S.bacteria.tax.clean.align.csv does not exist
For some reason pplacer isn't recognizing the -o flag. Is this a new installation of pplacer, or do you have an old version? Check that you have the most recent version of pplacer and guppy.
I am using pplacer version v1.1.alpha10, which (although it was installed this year) seems to be quite old. If you think that this version of pplacer is too old, I will check with our IT if they really need that old version or thy would be willing to update it. It seems that on our system guppy is part of the pplacer distribution.
I think this is the problem. Older versions of guppy don't have the to_csv output option, and apparently older versions of pplacer don't recognize the -o flag. I suggest updating both. You can probably update locally even if your administrators are reluctant to update the system version. In the meantime, if you just want to check out paprica to see if it will work for you, note that a virtual box appliance and Amazon ECS machine instance are available. Let me know if you have any issues with either, or if updating pplacer/guppy doesn't solve the problem!
Hi, I am trying to run PAPRICA on a linux server. I have installed all the dependencies, but when I try to run
./paprica-run.sh test.bacteria bacteria
I get the following error:Do you know if this error is because of a bug in PAPRICA or because my system was not set up correctly?
Thank you!
Cheers, Christiane