Open sagarutturkar opened 3 years ago
Dear Sagarutturkar,
If your data is miRNA sequencing data, after the QC model, the read length distribution should be like:
--
So, I suspected the adapter was not removed from the original read. Please check the prepare kit used in the experiment and provide the right adapter sequence in the command line.
Best Wishes, Jiang Li
Thank you for your quick response. This data is mix of multiple types of smallRNAs (microRNA, snoRNA, snRNA, tRNA) etc.
After I specified the right adapter sequence, "too short" is corrected. However, for several samples, I get very high percentage of reads as "mapped to multiple loci" which I suspect mostly the short reads (<30 bp) belonging to miRNA.
I had previously ran mirDeep2 for miRNA detection, which performs read collapsing step that in turn helps to keep multi-mapping reads at low number.
From mirDeep2:
Option '-m' will collapse the reads to remove redundancy and decrease the file size.
A sequencing read seen 10 times in your raw file will occur only once in the collapsed
file and have a _x10 in its identifier.
I was wondering if COMPSRA will follow any such collapsing step while miRNA analysis?
Dear sagarutturkar,
COMPSA doesn't have the collapsing step. The reason for "mapped to multiple loci" , based on my knowledge, may be that:
Specially, the miRNAs with the prefix "let-" always have lots of troubles which you can focus on. Hope these can help you.
Best Wishes, Jiang Li
I ran the QC and alignment modules on my own data with hg28. My original reads are 75bp single-end. After QC, I get most (>90%) of the reads with length >60 bp.
I get 88% of the reads as "% of reads unmapped: too short". Do you have any suggestions?