dieterich-lab / FUCHS

FUCHS - FUll circle CHaracterization from rna-Seq
GNU General Public License v3.0
3 stars 6 forks source link

Negative length exons & systematic coordinate errors #20

Open tgermade opened 4 years ago

tgermade commented 4 years ago

Dear Dieterich lab,

Your resources were a great help for our work! I would like to share 2 issues that we encountered when using FUCHS version 0.2.0:

  1. For some reason we found a few instances of transcripts containing exons of negative length in our *.exon_counts.bed output. We quantified an RNAseR treated dataset of long paired-end reads. Do you have an idea why this is? Here an example:

    chr17   74557901    74566174    ENSMUST00000182108  2   +   74557901    74566174    0,255,0 4   1,81,-1,1   0,4119,7891,8272
  2. The exon coordinates we obtained in our *.exon_counts.bed files systematically disagreed with mouse genome references & CircBase annotations. We encountered the following:

    • a 1 nt shift for all exons (probably due to a mix-up of 0- and 1-based coordinate systems)
    • additional shifts of start coordinates on all exons except the 1st for each transcript
    • in most cases a 1 nt long exon added before or after the transcript sequence (this might be related to issue #13)

One example of the problems (in IGV): fuchs_coordinate_issues

You can find the code we used to correct these points here: fuchs_adjust_coordinates.txt

Thanks for your time!

tjakobi commented 4 years ago

Dear @tgermade,

thank you so much for your detailed report.

I'll try to look into it as soon as I find some time for debugging.

Cheers, Tobias

Christina-hshi commented 3 years ago

I got the same problem, which the ".exon_counts.bed" is not consistent with the annotation by having exon regions shorter by 1, and unexpected 1bp long blocks near the ends. Please find two records in ".exon_counts.bed" as follows.

chr15 43692242 43694048 ENST00000260383 1 f 43692242 43694048 0,255,0 2 177,1 0,1805 chr3 195101738 195112876 ENST00000326793 4 r 195101738 195112876 0,255,0 3 1,119,57 0,894,11081

The coordinate systems used in the input files are listed for your reference.