DimmestP / chimera-quantseq

Apache License 2.0
1 stars 1 forks source link

Check why only 1/5 of reads are counted #8

Closed DimmestP closed 3 years ago

DimmestP commented 3 years ago

The majority of reads are either ignored because they do not overlap with a feature or they are multi-mapped.

Check strand used to count reads.

Use abundant 3'UTR rather than abundant (for whole transcript not just 3'UTR).

Look at raw BAM files in IGB and pick out 5-10 read ids. (On each strands, in 3'UTR, or overlapping ect). Then extract these reads and run featureCounts to see behaviour.

Check which genes are ignored due to multi-mapping or no features.

DimmestP commented 3 years ago

I have checked that:

Even with all of this the best I can do is have 40% of reads counted. The rest do no overlap a single feature (ORF + 3'UTR + 5'UTR).

Tomorrow (30th Jun) will select a handful of reads and find where they are being mapped to.

DimmestP commented 3 years ago

I realised that read 1 for the pairs is reversely stranded in quantseq REV which is important for featureCounts. I have changed the standedness appropriately and now get up to 70% of reads counted. The majority of the remaining counts are considered 'multimapped' so aren't counted. I opened the supposed multimapped genes in IGB and can see they the vast majority of these genes are in regions without annotations I have no idea why they are flagged as multimapped rather than no feature. This has highlighted that some genes are missing from Edwards 3'UTR annotation, which I will quickly investigate.

DimmestP commented 3 years ago

I have created a bam file of the paired reads that were not assigned to a gene and visualised them in IGB. It appears the majority of these reads are mapped to transposable elements or to the ends of chromosomes. I am therefore happy with the current 70% of reads counted.

DimmestP commented 3 years ago

Can close if @ewallace is happy

ewallace commented 3 years ago

70% of reads aligned and the rest multi-mapped to TEs and chromosome ends is great!

Bonus points for a summary figure showing mapping to TEs and chromosome ends. For this project this is only relevant as a quality control because our question is about 3' ends of a small set of target genes. But in principle it's nice to have a clear figure that communicates what went wrong with the rest. So, only for bonus points / low priority.