Magdoll / cDNA_Cupcake

Miscellaneous collection of Python and R scripts for processing Iso-Seq data
BSD 3-Clause Clear License
257 stars 104 forks source link

Is there a hard limit on inter-exonic ranges? #194

Open markb729 opened 2 years ago

markb729 commented 2 years ago

Does collapse_isoforms_by_sam.py have an upper limit to the inter-exon distance it will collapse into a putative isoform? In my animal model, I have a wildtype allele that spans about 154 Kb (some 40 or so exons) and generates a 8.6 Kb transcript. I also have engineered a deletion that spans approximately 83 Kb in a heterozygote background, which yields a much smaller transcript comprised of about 12 exons. This is easily seen with FLNC transcripts splice-mapped to the reference, shown below, including a majority of reads exhibiting the deletion (NB: the coverage imbalance is due to a stronger promoter on the deletion and not all reads with the deletion are shown).

Screen Shot 2022-02-10 at 12 53 24 PM

Consisting about 70 percent of all FLNC reads, this deletion should have been easily detected. However, the output of collapse_isoforms_by_sam.py --dun-merge-5-shorter ... yields only three slightly different patterns of the wild type allele; the KO allele is completely absent. No filtering or other 5' contingencies were performed. Do you have any explanation of why this may have occurred?

Thanks.

Magdoll commented 2 years ago

Hi, there is no inter exon distance limit. My guess is those KO isoforms but filtered out by collapse due to either alignment coverage below cutoff (99%) or identity (95%).

When you run collapse you get a “ignored-ids.txt” which tells you which FLnC reads were excluded and why — since you already know the FLnC for those KO can you see what happened to them?

On Thu, Feb 10, 2022 at 11:33 AM markb729 @.***> wrote:

Does collapse_isoforms_by_sam.py have an upper limit to the inter-exon distance it will collapse into a putative isoform? In my animal model, I have a wildtype allele that spans about 154 Kb (some 40 or so exons) and generates a 8.6 Kb transcript. I also have engineered a deletion that spans approximately 83 Kb in a heterozygote background, which yields a much smaller transcript comprised of about 12 exons. This is easily seen with FLNC transcripts splice-mapped to the reference, shown below, including a majority of reads exhibiting the deletion (NB: the coverage imbalance is due to a stronger promoter on the deletion and not all reads with the deletion are shown).

[image: Screen Shot 2022-02-10 at 12 53 24 PM] https://user-images.githubusercontent.com/15279264/153478210-85029e1f-0463-442a-af94-75cafe7ce354.png

Consisting about 70 percent of all FLNC reads, this deletion should have been easily detected. However, the output of collapse_isoforms_by_sam.py --dun-merge-5-shorter ... yields only three slightly different patterns of the wild type allele; the KO allele is completely absent. No filtering or other 5' contingencies were performed. Do you have any explanation of why this may have occurred?

Thanks.

— Reply to this email directly, view it on GitHub https://github.com/Magdoll/cDNA_Cupcake/issues/194, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQE34A6DCA6MBM5CNQU63U2QHH3ANCNFSM5OBZO6PQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Sent from Gmail Mobile. Excuse any possible typos.