GangCaoLab / CoolBox

Jupyter notebook based genomic data visualization toolkit.
https://gangcaolab.github.io/CoolBox/index.html
GNU General Public License v3.0
224 stars 37 forks source link

Apply CoolBox to the transcript coordinate system #72

Open ifantasy opened 2 years ago

ifantasy commented 2 years ago

Hi,

As some transcripts have very long introns before they are spliced, using the genome coordinate system will make the visualization less informative if someone only focuses on exon regions. See the MCM9 example below:

image

So, I would like to transform genome coordinates to transcript coordinates. This is another example for MYC transcript (ENST00000621592.5)

The bed12 format for ENST00000621592.5 in the genome coordinate system:

chr8    127736068   127741434   ENST00000621592.5   0   +   127736593   127740958   0   3   555,772,1039,   0,2179,4327,

The command line for the following plot:

coolbox add XAxis - add BED test1.bed12 --gene_style=normal - add TrackHeight "1.0" - add Title "ENST00000621592.5" - goto "chr8:127736068-127741434" - plot test1.pdf

image

The bed12 format for ENST00000621592.5 in the transcript coordinate system:

ENST00000621592.5   0   2366    ENST00000621592.5   0   +   0   525 1890    1   2366,   0,

The command line for the following plot:

coolbox add XAxis - add BED test2.bed12 --gene_style=normal - add TrackHeight "1.0" - add Title "ENST00000621592.5" - goto "ENST00000621592.5:0-2366" - plot test2.pdf

image

Before I said that it works, there are some issues need to be solved: 1) The tick labels in the top of the plot are not correct (correct labels: 0, 0.5, 1, 1.5, 2kb ?). It is also related to #50. 2) The UTR and CDS regions are not properly plotted. It should be related to the parsing of thick start and end fields in the bed12 format. In the above example, the thick start is 525, and thick end is 1890.

If the two issues are solved, I can add more various tracks based on the transcript coordinate system. I think this visualization will be useful if someone did the transcript-based research.