Closed hollybik closed 12 years ago
Using the most recent Devel (pulled from github) and Devel markers, I am getting 0 hits to 18S. Blasted a few sequences from the input files to find them hitting the 18S region well.
It seems that bowtie2 can not align 18S sequences
Will test pipeline tomorrow, to see if 18S is now working after Guillaume pushes the fix
Reads aren't disappearing into a dark hole. PS filters a lot of the hits by checking the coverage on the read or marker. This could be due to low quality data or it could be a general problem for 18S variable regions being too variable to align well enough to make it past our thresholds.
Can you point me to an 18S data set that is known to be of good quality?
Yes - have a bunch of 454 amplicon OTUs that are long (~400bp) and definitely get long alignments via BLAST. File is on Edhar:
/home/hollybik/TestData/OTU_data/octulist_Deepsea_uclust99_F04NF1.fasta
Running 18S OTUs. Out of 44585 input sequences, 2391 pass our threshold and move on to the alignment step.
When running the same code using 16S OTUs, 1918 input sequences get filtered down to 1727 hits moving on to the alignment step.
Fixed with issue #329
For 18S rRNA data, it seems like PhyloSift is not pulling down all the sequences -- I used two PE input files, each about 1.4GB in size, and only got 6 chunks of 18S sequences where the alignDir files were ~3MB each. These file sizes seem a bit small for such a big input files of 18S amplicon data, no?
Output files are on Edhar:
/home/hollybik/phylosfit_devel_20121005/PS_temp/1926-KO-1_1_trimmed.txt