Cufflinks produces very strange output when MicroRNA is annotated inside a intronic region of a gene

I first posted this problem on the cufflinks mailing list but no replies yet ( https://groups.google.com/forum/#!topic/tuxedo-tools-users/_E94jkdvMak )

The problem we are experiencing is that in GRCH38 annotation from refseq the gene HLA-B is annotated on chromosome 6 and in between its exons 4,5 there is an MiRNA annotated called MIR6891. The quantification with Cuffquant goes terribly wrong here as the table below will show:

HLA-B 1340.16 994.534 923.688 1650.58 1266.27 2167.43 2692.21
MIR6891 329936 167527 113865 399491 282248 82857.6 114646

As you can see i got counts for the MIR which go through the roof while the HLA-B gene is relatively low expressed compared to the MIR. When checking in IGV or UCSC genome browser i see that there is not a single read aligning to the MIR but a lot of split reads cover the region. Our current guess is now that these split reads are all asigned to the MIR as well, while the actual bases are aligned to exon 4 and 5 of the HLA-B gene. I know this is probably not easy to fix, but maybe a good idea to distribute a GTF file containing only MRNAs and LINCRNA. Or something like that, i wonder if other users experienced the same issue and how they circumvented the issue. Thanks already!

cole-trapnell-lab / cufflinks

Cufflinks produces very strange output when MicroRNA is annotated inside a intronic region of a gene #52