almlab / SmileTrain

a 16s pipeline
MIT License
6 stars 6 forks source link

USEARCH chimera checking output seems wrong #69

Open spacocha opened 9 years ago

spacocha commented 9 years ago

Currently using UCHIME to do dbotu_chimera checking seems to be wrong. The printed output from uchime on the mock community samples says there are 36 nonchimeric sequences, but the output is only 34 sequences. I'm not sure what happened to the other 2 sequences, but usearch doesn't throw an error and it's not clear why there is this discrepancy. Obviously not about SmileTrain, but something to try to figure out.

swo commented 9 years ago

can't do anything about this without any example data...

spacocha commented 9 years ago

/net/radiodurans/alm/spacocha/raw_data/mock_mixes/dbOTU_dir

python ~/lib/SmileTrain/otu_caller.py -f all_R1_2file.fastq -r all_R2_2file.fastq -b vanja_59sam_mapping_rc.txt2 --split -n 1 --merge --demultiplex --qfilter --dereplicate --index --dbotu --maxee 0.5 --alignref /home/spacocha/tmpdir/silva.bacteria.m22534.m6428.filter.fasta --k_fold 10 --dbotu_chimeras --dbotu_split --JS_cutoff 0.02 --dbotu_id 0.08 --local

Although that command has the --JS_cutoff in it...