dieterich-lab / circtools

circtools: a modular, python-based framework for circRNA-related tools that unifies several functionalities in a single, command line driven software.
http://circ.tools
GNU General Public License v3.0
25 stars 20 forks source link

Investigate what 'unknown breakpoint' events are #42

Open tjakobi opened 6 years ago

tjakobi commented 6 years ago

Those events can be found in the data files and are plotted. However, what does it mean? The BSJ is neither covered by one nor by two mates, so how exactly is it covered?

tjakobi commented 6 years ago

After looking into the source code of FUCHS I am able to reconstruct what has to happen in order to produce an undefined event.

First of all, I assessed the ratio of undefined events throughout all samples: 2.2%

Secondly, if len(mates[mate][strand]['start']) == 1 and len(mates[mate][strand]['end']) == 1: has to fail for the forward and reverse strand for a specific mate. Here I assume that mate is actually a readname and not a mate "pair". I.e. the length of start / stop positions has to be of unequal size for forward and reverse strand. The non-matching entries are saved into another variable, fragments, but not used anymore:

mates, fragments = self.get_reads_from_bamfile('%s/%s' % (self.bamfolder, f), circle_coordinates)

The whole mis-classification happens only if a read does not start nor end exactly on the circle coordinate specified via the circle BAM file name.

Right now it's hard to tell if we can fix this (maybe its just a problem with the mapping?). Anyway, the impact does not seem to be to significant.