Closed mariachiara-github closed 3 months ago
Hi,
Great question!
We classify genes as potential trans splicing as genes which have high numbers ( >= 8 by default) of candidate partners, for which none of the candidate partners overlap with the potential trans splicing gene.
We also mark these as low quality if the expected coverage on partner genes mismatches significantly, in addition to the standard pbfusion filtering criteria.
Happy to answer any further questions,
Daniel
Hi,
Great question!
We classify genes as potential trans splicing as genes which have high numbers ( >= 8 by default) of candidate partners, for which none of the candidate partners overlap with the potential trans splicing gene.
We also mark these as low quality if the expected coverage on partner genes mismatches significantly, in addition to the standard pbfusion filtering criteria.
Happy to answer any further questions,
Daniel
Thank you so much for your answer!! And what about the Fusions, Sense Antisense and Overlap classes? What are the criteria to classify fusions, found by pbfuson, in one of these classes ?
SenseAntisense means that a read aligned to the same locus on both strands. This often happens in eukaryotic transcription, where they are called sense-antisense chimeras. The kallikrein (KLK) genes are a particularly prominent example. There can be false positives for certain low-complexity regions as well.
Readthrough is assigned when a read aligns to two genes on the same chromosome, orientation, and relative position such that polymerase can start reading on one gene and continue reading on another. We use 100kb as a threshold, meaning that if the end of the last alignment to the first gene is less than 100kb upstream of the second, it is marked as a read-through event. These are also common. Some are even annotated in genbank. This annotation lets users distinguish this kind of event from one created by a genomic rearrangement.
Overlap is assigned when the two genes to which a read aligned overlap with each other. This is also common; often a single exon from a gene that overlaps is shared, and some of this comes from errors in mapping/alignment. Again, this doesn't require genomic rearrangements.
We use Fusion for all other events. This means the fusion cannot be explained by a read-through event, the genes do not overlap, and the gene is not aligned to both strands atthe same locus. Essentially, these events are more likely to be due to a genomic rearrangement.
I'm closing this for now - feel free to re-open with more questions!
We plan to extend the documentation to include this with the next release.
I'm closing this for now - feel free to re-open with more questions!
We plan to extend the documentation to include this with the next release.
Thank you so much for the exhaustive explanation about the classes! I actually do have another question, how do you discriminate between LOW and MEDIUM fusions? (the minimum fusion quality to emit). In particular what are the parameters that you look at to say that a fusion is of MEDIUM quality? Thank you again!
Hi Maria,
Great question.
LOW is assigned when a candidate fusion is readthrough, between overlapping genes, or fails other QC tests. We filter them by default , but --min-fusion-quality LOW
causes all events to be emitted. This can be important for some fusions.
These tests are:
--min-min-mapq
.These can all be tweaked via command-line options, but they work pretty well.
HIGH quality is reserved for future work, but we do not assign any currently.
Let me know what further questions you may have.
Thanks,
Daniel
Mitelman
Hi Daniel! Thank you again for the very clear and fast answer!
Hi, I was wondering how the classification (CL) of the fusions detected by pbfusion is inferred. For example, based on what do you classify a fusion as PotentialTransSplicing, or as FUSION ? I hope my question is clear, let me know if not :)
Thank you for your help!