gjospin / PhyloSift

Phylogenetic and taxonomic analysis for genomes and metagenomes
83 stars 18 forks source link

Files of 18S data not being fully analyzed in PhyloSift? #320

Closed hollybik closed 12 years ago

hollybik commented 12 years ago

For 18S rRNA data, it seems like PhyloSift is not pulling down all the sequences -- I used two PE input files, each about 1.4GB in size, and only got 6 chunks of 18S sequences where the alignDir files were ~3MB each. These file sizes seem a bit small for such a big input files of 18S amplicon data, no?

Output files are on Edhar:

/home/hollybik/phylosfit_devel_20121005/PS_temp/1926-KO-1_1_trimmed.txt

gjospin commented 12 years ago

Using the most recent Devel (pulled from github) and Devel markers, I am getting 0 hits to 18S. Blasted a few sequences from the input files to find them hitting the 18S region well.

koadman commented 12 years ago

It seems that bowtie2 can not align 18S sequences

hollybik commented 12 years ago

Will test pipeline tomorrow, to see if 18S is now working after Guillaume pushes the fix

gjospin commented 12 years ago

Reads aren't disappearing into a dark hole. PS filters a lot of the hits by checking the coverage on the read or marker. This could be due to low quality data or it could be a general problem for 18S variable regions being too variable to align well enough to make it past our thresholds.

Can you point me to an 18S data set that is known to be of good quality?

hollybik commented 12 years ago

Yes - have a bunch of 454 amplicon OTUs that are long (~400bp) and definitely get long alignments via BLAST. File is on Edhar:

/home/hollybik/TestData/OTU_data/octulist_Deepsea_uclust99_F04NF1.fasta

gjospin commented 12 years ago

Running 18S OTUs. Out of 44585 input sequences, 2391 pass our threshold and move on to the alignment step.

gjospin commented 12 years ago

When running the same code using 16S OTUs, 1918 input sequences get filtered down to 1727 hits moving on to the alignment step.

gjospin commented 12 years ago

Fixed with issue #329