bowmanjeffs / paprica

paprica - PAthway PRediction by phylogenetIC plAcement
27 stars 8 forks source link

RAxML_distances.dist is missing #21

Closed karoraw1 closed 8 years ago

karoraw1 commented 8 years ago

I can't figure out why the RAxML_distances.dist file isn't being created.

The rm: cannot remove '/home/login/Desktop/genome_finder/ref_genome_database/*dist': No such file or directory was produced by lines 277/278, indicating that the possibility of an no-clobber error is not likely.

Any hints on how to go about properly diagnosing the issue and/or solving it are very welcome. I included everything I thought might be relevant, but if any other info is required, please let me know.

Error Message

writing vector matrix 2680 of 2682
writing vector matrix 2681 of 2682
writing vector matrix 2682 of 2682
Traceback (most recent call last):
  File "paprica_make_ref_v0.21.py", line 440, in <module>
    dist_16S = pd.read_table(variables['ref_dir'] + 'RAxML_distances.dist', names = ['taxa1', 'taxa2', 'distance'], delim_whitespace = True)
  File "/home/login/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 491, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/login/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 268, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/login/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 583, in __init__
    self._make_engine(self.engine)
  File "/home/login/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 724, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/login/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1093, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 350, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3229)
  File "pandas/parser.pyx", line 583, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:6042)
IOError: File /home/login/Desktop/genome_finder/ref_genome_database/RAxML_distances.dist does not exist

RAxML_info.dist content

Using BFGS method to optimize GTR rate parameters, to disable this specify "--no-bfgs" 

This is RAxML version 8.2.4 released by Alexandros Stamatakis on October 02 2015.

With greatly appreciated code contributions by:
Andre Aberer      (HITS)
Simon Berger      (HITS)
Alexey Kozlov     (HITS)
Kassian Kobert    (HITS)
David Dao         (KIT and HITS)
Nick Pattengale   (Sandia)
Wayne Pfeiffer    (SDSC)
Akifumi S. Tanabe (NRIFS)

Alignment has 1918 distinct alignment patterns

Proportion of gaps and completely undetermined characters in this alignment: 39.16%

RAxML Computation of pairwise distances

Using 1 distinct models/data partitions with joint branch length optimization

All free model parameters will be estimated by RAxML
GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter

GAMMA Model parameters will be estimated up to an accuracy of 0.1000000000 Log Likelihood units

Partition: 0
Alignment Patterns: 1918
Name: No Name Provided
DataType: DNA
Substitution Matrix: GTR

RAxML was called as follows:

raxmlHPC-PTHREADS-AVX2 -T 2 -f x -p 12345 -s /home/login/Desktop/genome_finder/ref_genome_database/combined_16S.align.fasta -m GTRGAMMA -n dist

genome_finder directory contents

selection_037

selection_038

bowmanjeffs commented 8 years ago

Interesting... if that is the complete content of RAxML_info.dist than RAxML either hasn't completed, or failed but didn't report any specific error. Can you confirm that RAxML isn't running in the background? Python should wait for it to finish, but perhaps something prevented this. If RAxML isn't running in the background can you please try running that command outside of paprica_make_ref? i.e.:

raxmlHPC-PTHREADS-AVX2 -T 2 -f x -p 12345 -s /home/login/Desktop/genome_finder/ref_genome_database/combined_16S.align.fasta -m GTRGAMMA -n dist

...and let me know what happens? I haven't seen this error before but am trying to replicate.

bowmanjeffs commented 8 years ago

Haven't been able to replicate so far. If RAxML completes the RAxML_info.dist file should look like:

Using BFGS method to optimize GTR rate parameters, to disable this specify "--no-bfgs" 

This is RAxML version 8.1.15 released by Alexandros Stamatakis on December 25 2014.

With greatly appreciated code contributions by:
Andre Aberer      (HITS)
Simon Berger      (HITS)
Alexey Kozlov     (HITS)
Kassian Kobert    (HITS)
David Dao         (KIT and HITS)
Nick Pattengale   (Sandia)
Wayne Pfeiffer    (SDSC)
Akifumi S. Tanabe (NRIFS)

Alignment has 1914 distinct alignment patterns

Proportion of gaps and completely undetermined characters in this alignment: 39.03%

RAxML Computation of pairwise distances

Using 1 distinct models/data partitions with joint branch length optimization

All free model parameters will be estimated by RAxML
GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter

GAMMA Model parameters will be estimated up to an accuracy of 0.1000000000 Log Likelihood units

Partition: 0
Alignment Patterns: 1914
Name: No Name Provided
DataType: DNA
Substitution Matrix: GTR

RAxML was called as follows:

raxmlHPC-PTHREADS-AVX2 -T 2 -f x -p 12345 -s /volumes/hd1/paprica_test/paprica/ref_genome_database/combined_16S.align.fasta -m GTRGAMMA -n dist 

Log Likelihood Score after parameter optimization: -508255.488499

Computing pairwise ML-distances ...

Time for pair-wise ML distance computation of 3549780 distances: 413.144713 seconds

Distances written to file: /volumes/hd1/paprica_test/paprica/ref_genome_database/RAxML_distances.dist
karoraw1 commented 8 years ago

I think I got it.

$ cat /proc/cpuinfo | grep -c avx2
0
$ cat /proc/cpuinfo | grep -c avx
8
$  cat /proc/cpuinfo | grep -c sse3
8

I can recompile the appropriate version of RAxML and modify the make_ref script to call it properly. Sorry for the run-around. Thanks for your help.

bowmanjeffs commented 8 years ago

Glad you got it. Be advised that the previous issue that I thought was fixed is not. Apparently wget managed to download all the faa files during the test, so the test didn't fail. I have a new fix in place that is being tested now. Will hopefully have a working version up later tonight.