~95% of the kmer from RNA-seq data failed to match the theoretical kmer profile

JY-Zhou / FreePSI

An alignment-free approach to estimating exon-inclusion ratios without a reference transcriptome

GNU General Public License v3.0

10 stars 2 forks source link

~95% of the kmer from RNA-seq data failed to match the theoretical kmer profile #8

Open ArashDepp opened 2 years ago

ArashDepp commented 2 years ago

Hi.

In my analysis of single end RNA-seq data, ~95% of kmers are discarded for not matching the theoretical kmer profile. Could you please suggest what could be the possible reason for such a low mapping? Secondly, how reliable would be the resulting PSI in such cases of low mapping. Here is a snapshot for reference:

Please see the 6th last line in the snap shot to address this.

Thank you

JY-Zhou commented 2 years ago

This is weird, normally the matched k-mers should be reserved at least 80% (in the example case it reserved 95%). One possible reason is that the length of k is inconsistent throughout the process. FreePSI uses Jellyfish to count k-mers in reads (see this script), and the k used in Jellyfish should be equal to the k used in the build or quant step in FreePSI. Also, the strand of reads should also be considered when preprocessing reads.