Open csmiller opened 11 years ago
Hi @csmiller, I think I'm encountering an identical, or very related, issue. If there are fewer than 50 reads aligning to the SSU database (I'm using the default downloaded by emirge) I get:
Clustering sequences for iteration 0 at Thu Apr 7 10:17:19 2016...
cluster threshold = 0.970
[fai_load] build FASTA index.
[fai_load] fail to open FASTA index.
Traceback (most recent call last):
File "/usr/local/bin/emirge.py", line 4, in <module>
__import__('pkg_resources').run_script('EMIRGE==0.60.3', 'emirge.py')
File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 719, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 1490, in run_script
exec(script_code, namespace, namespace)
File "/Library/Python/2.7/site-packages/EMIRGE-0.60.3-py2.7-macosx-10.8-x86_64.egg/EGG-INFO/scripts/emirge.py", line 1697, in <module>
File "/Library/Python/2.7/site-packages/EMIRGE-0.60.3-py2.7-macosx-10.8-x86_64.egg/EGG-INFO/scripts/emirge.py", line 1688, in main
File "/Library/Python/2.7/site-packages/EMIRGE-0.60.3-py2.7-macosx-10.8-x86_64.egg/EGG-INFO/scripts/emirge.py", line 1439, in do_iterations
File "/Library/Python/2.7/site-packages/EMIRGE-0.60.3-py2.7-macosx-10.8-x86_64.egg/EGG-INFO/scripts/emirge.py", line 500, in do_iteration
File "/Library/Python/2.7/site-packages/EMIRGE-0.60.3-py2.7-macosx-10.8-x86_64.egg/EGG-INFO/scripts/emirge.py", line 807, in cluster_sequences
File "/Library/Python/2.7/site-packages/EMIRGE-0.60.3-py2.7-macosx-10.8-x86_64.egg/EGG-INFO/scripts/emirge.py", line 829, in cluster_sequences2
File "/Library/Python/2.7/site-packages/EMIRGE-0.60.3-py2.7-macosx-10.8-x86_64.egg/EGG-INFO/scripts/emirge.py", line 769, in write_consensus_with_mask
File "pysam/cfaidx.pyx", line 113, in pysam.cfaidx.FastaFile.__cinit__ (pysam/cfaidx.c:2025)
File "pysam/cfaidx.pyx", line 149, in pysam.cfaidx.FastaFile._open (pysam/cfaidx.c:2845)
IOError: could not open file `/Users/cmorganlang/EMIRGE_16S_prediction/BioOHT_76.AD-318_B19_Frak/iter.00/iter.00.cons.fasta`
It looks like this stems from the pysam call within write_consensus_with_mask
judging by the location in the run log at which the pipeline breaks.
I am trying to assemble the 16S genes from SAGs and EMIRGE is working successfully for about half the SAGs. I guess it can't fully reconstruct a single 16S from the 30-45 reads that do align. Would it be worth decreasing the MIN_DEPTH parameter to 1 for these samples or could this produce an erroneous 16S assembly?
If you send or post the full output I can verify, but this looks like you are correct: no single 16S is reconstructed.
My first question is why you are using EMIRGE for a SAG? Does the 16S not pop out of the assembly?
Your approach to lower the MIN_DEPTH param. might work, though I would try lowering incrementally. With MIN_DEPTH=1, I worry the large number of erroneous mappings might make the task harder for EMIRGE. You could also try running emirge_amplicon.py if number of reads / SAG is not too high. Or preselect the 16S reads and try emirge_amplicon.py.
Could you post this on emirge google group? You might get other feedback, and I bet others would be curious to find out how these various options worked out for you.
Chris
Until recently, as far as I'm aware, several institutions experienced contamination issues during SAG processing where cells from one well would be carried over into other wells prior to amplification. I'm using EMIRGE to screen the SAGs and see if there are multiple 16S genes present, and it seems to be doing a good job so far. The 16S sequences would otherwise become convoluted during assembly normal DBG assembly. EMIRGE also provides the candidate taxa ID to make it even easier :)
Sure, I'll post my findings to the google group.
If EMIRGE cannot reconstruct any sequences due to low read coverage, it currently gets to the clustering stage where usearch throws a segfault and emirge exits without an informative error message.