bowmanjeffs / paprica

paprica - PAthway PRediction by phylogenetIC plAcement
26 stars 8 forks source link

"[Name].combined_16S.bacteria.tax.clean.align.csv does not exist" error #60

Closed sturne29 closed 5 years ago

sturne29 commented 5 years ago

Hello! I'm running the VirtualBox paprica appliance. To be sure that the program was at the most current version, so I did a fresh clone of the git repository before starting. I also adjusted the default amount of RAM allocated to the VM upwards, because I have plenty available on this computer. Because I was running into issues of too little space on the VM to complete the analysis, I've also been keeping all of my files in a folder shared with the host machine and running the analysis there.

I am getting an error whenever I try to run the program:

Traceback (most recent call last): File "/home/demo/paprica/paprica-tally_pathways.py", line 150, in query_csv = pd.read_csv(cwd + query, header = 0) File "/home/demo/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 498, in parser_f return _read(filepath_or_buffer, kwds) File "/home/demo/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 275, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/home/demo/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 590, in init self._make_engine(self.engine) File "/home/demo/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 731, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/demo/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1103, in init self._reader = _parser.TextReader(src, **kwds) File "pandas/parser.pyx", line 353, in pandas.parser.TextReader.cinit (pandas/parser.c:3246) File "pandas/parser.pyx", line 591, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:6111) IOError: File /media/sf_shared/try2/sample_file1.combined_16S.bacteria.tax.clean.align.csv does not exist

Any ideas why I might be getting this error and what I can do about it?

bowmanjeffs commented 5 years ago

So it looks to me like something is broken well upstream of this particular error. First, try running the test file. Next, in paprica-run.sh comment out the paprica-tally_pathways... line to make sure that the first command completes successfully. Probably cmalign is not completing.

sturne29 commented 5 years ago

The test completed successfully, as did running paprica-run.sh with the tally_pathways line commented out.

bowmanjeffs commented 5 years ago

So to be clear, the test file completes successfully but your analysis file still does not? With the tally_pathways line commented out does the first command complete without error for the analysis file?

sturne29 commented 5 years ago

Yes, that's correct. The first command does complete without error for my own data, at least for the file that I picked for a test candidate.

bowmanjeffs commented 5 years ago

Were you using the subsampling option (i.e. the -n flag)?

sturne29 commented 5 years ago

I didn't.

bowmanjeffs commented 5 years ago

Okay, I'm a little mystified if the first command didn't error out. You'll notice that the second command couldn't find /media/sf_shared/try2/sample_file1.combined_16S.bacteria.tax.clean.align.csv. Can you list the files that paprica did create?

sturne29 commented 5 years ago

Here's the list of output files from running paprica-run.sh with the tally_pathways line commented out:

sample_file1.sub.bacteria.unique.seqs.csv sample_file1.sub.clean.align.sto sample_file1.sub.clean.fasta sample_file1.sub.combined_16S.bacteria.tax.clean.align.csv sample_file1.sub.combined_16S.bacteria.tax.clean.align.db sample_file1.sub.combined_16S.bacteria.tax.clean.align.fasta sample_file1.sub.combined_16S.bacteria.tax.clean.align.jplace sample_file1.sub.combined_16S.bacteria.tax.clean.align.phyloxml sample_file1.sub.combined_16S.bacteria.tax.clean.align.sto sample_file1.sub.fasta

So... it does look like the file isn't there! But the command finished with the "Thanks for using paprica!" line, with no apparent errors.

bowmanjeffs commented 5 years ago

So the "sub" in the file names indicates that you do have the -n flag specified in paprica-run.sh. You'll note that in your original error paprica was looking for: sample_file1.combined_16S.bacteria.tax.clean.align.csv

Instead the file is: sample_file1.sub.combined_16S.bacteria.tax.clean.align.csv

Inside the paprica-run.sh script you'll see a comment describing how you need to change the naming convention if you use the -n script. At some point in the future I'll change this behavior as it tends to trip people up. Let me know if things still aren't clear/not working.

sturne29 commented 5 years ago

Oh no, I'm sorry for being unintentionally misleading. I guess I must have changed it at some point and then forgotten about doing that. Thanks for your help, I really appreciate it.

bowmanjeffs commented 5 years ago

No worries, let me know if you run into any other problems.