frattalab / PAPA

PAPA (Pipeline-Alternative Polyadenylation) - Snakemake pipeline for analysis of APA from short-read RNA-seq data
GNU General Public License v3.0
1 stars 0 forks source link

filter_tx_by_intron_chain.py - 'internal_intron_spliced' events likely excludes events in the final reference intron #18

Closed SamBryce-Smith closed 1 year ago

SamBryce-Smith commented 2 years ago

In annotate_internal_spliced, putative internal intron spliced last exon events are identified by checking for containment inside internal reference introns i.e. not the first nor last intron.

# Find unclassified valid events completely contained within annotated introns
    ref_internal_introns = get_internal_regions(ref_introns,
                                                feature_col="Feature",
                                                feature_key="intron",
                                                id_col="transcript_id",
                                                region_number_col="intron_number")

    int_spliced = novel_last_exons_nc.overlap(ref_internal_introns,
                                              strandedness="same",
                                              how="containment")

Whilst excluding the first intron is fine (as no intron chain to match and have separate filtering logic), a novel last exon could appear in the final intron of the reference last exon (e.g. ONECUT1), like the proximal isoform of a classical ALE event. These would otherwise be classified as 'other', where really this is a suitable definition for these events.

Simple fix would be to also extract reference last introns & concat the two grs for checking

SamBryce-Smith commented 1 year ago

Closing as don't plan to use filter_tx_by_intron_chain.py further