deeptools / pyGenomeTracks

python module to plot beautiful and highly customizable genome browser tracks
GNU General Public License v3.0
752 stars 111 forks source link

add CNV #407

Open MarineBergot opened 2 years ago

MarineBergot commented 2 years ago

Hi ! thanks for your work, i would like to know if it's possible to add CNV/SNV on the plot? thanks :)

i'm using: python 3.8 pyGenomeTracks 3.7

lldelisle commented 2 years ago

Hi @MarineBergot , Do you mean from a database? Common CNV/SNV? All we plot in pyGenomeTracks rely on local files. I don't know in which format are encoded SNV/CNV.

MarineBergot commented 2 years ago

Thanks for your super quick answer! no it's local files, with our pipeline we have file with all CNV/SNV in vcf or gvcf format My goal would be to plot area with TAD, gene expression, CNV/SNV and genes

lldelisle commented 2 years ago

For the moment, we don't support vcf. Could you make a scheme on how you would represent them, so when we have time to develop new features, we know how it could be useful? Meanwhile I guess you can convert the vcf to bed (if you don't know how to do, just do a head of your vcf I will give you the command)...

MarineBergot commented 2 years ago

Hi again ! i'm trying to add CNV to my (not perfect) plot, i have issue with color. I tried to cheat and added color with bed12 but i don't have bed12. i converted vcf to bed with rgb color regarding the type of CNV but i don't know what to put in the 3 last columns then it's not working (i tried with 0 and 1 but no ) my file looks like that :

chr1 43593267 43594572 . 0 . 43593267 43594572 188,100,241 chr1 30195841 218197336 . 0 . 30195841 218197336 188,100,241 chr1 205209141 205210012 . 0 . 205209141 205210012 188,100,241 chr1 237402443 237403261 . 0 . 237402443 237403261 188,100,241 chr1 247146749 247149421 . 0 . 247146749 247149421 188,100,241 chr2 207644871 207645844 . 0 . 207644871 207645844 188,100,241 chr3 44699170 44701119 . 0 . 44699170 44701119 188,100,241 chr3 93470002 93471081 . 0 . 93470002 93471081 188,100,241 chr3 165092497 165093482 . 0 . 165092497 165093482 188,100,241 chr4 49104007 49658148 . 0 . 49104007 49658148 188,100,241

my .ini file

[x-axis] where = top

[hic matrix] file = dijhic008/hicMatrix/dijhic008_50000_merge_norm_correct.h5 title = 50kb dijhic008 chromothripsis

depth is the maximum distance plotted in bp. In Hi-C tracks

the height of the track is calculated based on the depth such

that the matrix does not look deformed

depth = 5000000 transform = log1p file_type = hic_matrix

[tads] file =dijhic008_TADs_50000_merge_domains.bed file_type = domains border_color = black color = none

the tads are overlay over the hic-matrix

the share-y options sets the y-axis to be shared

between the Hi-C matrix and the TADs.

overlay_previous = share-y

[spacer] height = 0.05

[hic matrix] file = dijhic001_50000_norm_correct.h5 title = 50kb dijhic001 CTRL

depth is the maximum distance plotted in bp. In Hi-C tracks

the height of the track is calculated based on the depth such

that the matrix does not look deformed

depth = 1000000 transform = log1p file_type = hic_matrix

[tads] file = dijhic001_TADs_50000_domains.bed file_type = domains border_color = black color = none

the tads are overlay over the hic-matrix

the share-y options sets the y-axis to be shared

between the Hi-C matrix and the TADs.

overlay_previous = share-y

[spacer] height = 0.05

[test arcs overlay] file =dijarn085.arc line_width = 3 height=3

[gtf exons] file = grch38.refseq.with_genes.gtf height = 10 title = gtf (exons and transcripts)
fontsize = 12 file_type = gtf prefered_name = gene_name

[CNV] file = dijen080_cnv_bed12.bed height = 7 title = genes (bed12) style = flybase; fontsize = 10; height_utr = 0.7 style = flybase fontsize = 10 height_utr = 0.7

[bigwig] file = ENCFF757LRW.bw color = blue height = 4 title = CTCF min_value = 0 max_value = 30

[vlines] file = genes.bed type = vlines

do you have any idea i could do to have on color regarding the type of CNV? (and how to put my 2 TAD plot on the same scale? ) and if you have something for RNAseq data? like to add some plot for each gene with the number of count to see if the gene is down or up ? sorry for all my asking and thanks again for your incredible help!

TAD_chromo

lldelisle commented 2 years ago

Hi,

  1. For the CNVs, you have currently a bed9 (9 fields) and you can use this in pyGenomeTracks. To see them, in color, you can use the option color = bed_rgb and I would not put label as they are '.' for all. So it would be:
    [CNV]
    file = dijen080_cnv_bed12.bed
    height = 3
    title = CNV
    color = bed_rgb
    labels = false
  2. For the genes, I don't know if you want to see all transcripts for a single gene. If you don't you can use the options merge_transcripts = true and merge_overlapping_exons = true.
  3. For the Hi-C matrices, I guess you did not specify the --binSize option in hicBuildMatrix. This is why you have 2 different resolutions. What I would do: a. Run hicInfo -m dijhic008/hicMatrix/dijhic008_50000_merge_norm_correct.h5 to get the resolution of the first matrix. b. Rerun hicBuildMatrix --samFiles .... --binSize ... for the second: dijhic001 with the binSize obtained in (a.) c. Rerun correction and TAD calling on corrected matrices
  4. For the RNA-seq, we don't have specific tracks. I think there are 3 options: a. You make a coverage from your RNA-seq with bedtools and you plot the 2 bedgraphs/bigwigs b. You compute FPKMs or other measure and you put this value in the fifth column of a bed and you plot the 2 beds with color = Reds or any other colormap (https://matplotlib.org/stable/tutorials/colors/colormaps.html). c. You extract from a DE analysis the log2 fold-change and you generate a bed with this value in the fifth field and you plot this bed with color = bwr or any other Diverging colormap (https://matplotlib.org/stable/tutorials/colors/colormaps.html#diverging).
lldelisle commented 2 years ago

PS: For the next exchanges, would you mind to put your ini file in a code block? You can generate a code block with triple ` followed by ini:

```ini
your ini file

Thanks
MarineBergot commented 2 years ago

thanks for your quick answer, i tried as you suggested for CNV but nothing is shown :/ this is my bed (i'm supposed to have only one CNV in this area):

chr8    26054488    26055284    None    0   .   26054488    26055284    188,100,241
chr8    43238145    43242668    None    0   .   43238145    43242668    188,100,241
chr8    85261666    85262425    CA13    0   .   85261666    85262425    188,100,241
chr8    72111027    72111742    None~None   0   .   72111027    72111742    242,243,244
chr8    84952812    84953527    None~None   0   .   84952812    84953527    242,243,244
chr8    99145226    99145941    VPS13B~VPS13B   0   .   99145226    99145941    242,243,244
chr8    130897333   130898048   ADCY8~ADCY8 0   .   130897333   130898048   242,243,244
chr8    51817225    51817940    PCMTD1~None 0   .   51817225    51817940    242,243,244
chr8    51818564    51819279    PCMTD1~None 0   .   51818564    51819279    242,243,244
chr8    14488547    14489262    SGCZ~GXYLT1 0   .   14488547    14489262    242,243,244
chr8    15431497    15432212    None~KLF12  0   .   15431497    15432212    242,243,244
chr8    30287528    30288243    None~None   0   .   30287528    30288243    242,243,244
chr8    28576084    28576799    None~None   0   .   28576084    28576799    242,243,244
chr8    133959623   133960338   None~None   0   .   133959623   133960338   242,243,244
chr8    93653916    93654631    LINC00535~None  0   .   93653916    93654631    242,243,244
chr8    643942  649756  ERICH1  0   .   643942  649756  221,20,35
chr8    6111142 6115456 None    0   .   6111142 6115456 221,20,35
chr8    15919242    15930056    None    0   .   15919242    15930056    221,20,35
chr8    25114442    25133956    None    0   .   25114442    25133956    221,20,35
`[x-axis]
where = top

[hic matrix]
file = hicMatrix/dijhic008_25000_merge_norm_correct.h5
title = 25kb dijhic008 chromothripsis
# depth is the maximum distance plotted in bp. In Hi-C tracks
# the height of the track is calculated based on the depth such
# that the matrix does not look deformed
depth = 3000000
transform = log1p
file_type = hic_matrix
max_value=50

[tads]
file = TADs/dijhic008_TADs_25000_merge_domains.bed
file_type = domains
border_color = black
color = none
# the tads are overlay over the hic-matrix
# the share-y options sets the y-axis to be shared
# between the Hi-C matrix and the TADs.
overlay_previous = share-y

[spacer]
height = 0.05

[hic matrix]
file = hicMatrix/dijhic001_25000_norm_correct.h5
title = 25kb dijhic001 CTRL
# depth is the maximum distance plotted in bp. In Hi-C tracks
# the height of the track is calculated based on the depth such
# that the matrix does not look deformed
depth = 3000000
transform = log1p
file_type = hic_matrix
max_value = 50

[tads]
file =TADs/dijhic001_TADs_25000_domains.bed
file_type = domains
border_color = black
color = none
# the tads are overlay over the hic-matrix
# the share-y options sets the y-axis to be shared
# between the Hi-C matrix and the TADs.
overlay_previous = share-y

[spacer]
height = 0.05
[test arcs overlay]
file = TADs/dijarn085.arc
line_width = 3
height=3

[gtf exons]
file =grch38.refseq.with_genes.gtf
height = 12
title = gtf grch38 (exons and transcripts)
fontsize = 12
file_type = gtf
prefered_name = gene_name

[CNV]
file = dijen080_cnv_bed9.bed
height = 3
title = CNV
color = bed_rgb
labels = false

[bigwig]
file = ENCFF757LRW.bw
color = blue
height = 4
title = CTCF
min_value = 0
max_value = 30

`

3) it was supposed to be same bin but after checking, yeah it wasn't, now it's working perfectly thanks 4) i'm experimenting your options i will let you know the results!

TAD_chromo_merge

MarineBergot commented 2 years ago

edit : CNV worked but scale is not right, apparently the black line is supposed to be my CNV i tried with other region, do you have any idea how to improve the figure? TAD_chromo_merge_test_cnv

lldelisle commented 2 years ago

What do you mean by 'scale is not right'. In your bed I see:

chr8    643942  649756  ERICH1  0   .   643942  649756  221,20,35

Which is the one you see in your region (chr8:6-11Mb). If the problem is that you don't see the color, add border_color = bed_rgb . If you know the CNV does not overlap, you can put:

[CNV]
file = dijen080_cnv_bed9.bed
height = 1
display = collapsed
title = CNV
color = bed_rgb
border_color = bed_rgb
labels = false
MarineBergot commented 2 years ago

ah it's perfect like that thanks ! TAD_chromo_merge

MarineBergot commented 2 years ago

H! it's me again :) i'm still working on my plot to add RNA-seq data. I added easily coverage but i would like to add zscore, do you have something like that? i tried with a bedgraph file but nothing is shown, i can't understand why

my file looks like that :


"chr8"  95133784    95156685    -2.51
"chr8"  9555911 9782346 0.44
"chr8"  96222946    96235545    -0.39
"chr8"  96239401    96261610    -0.97
"chr8"  96261901    96336995    -0.4
"chr8"  96493812    96611790    -1.97
"chr8"  96645241    97143501    0.95
"chr8"  97273487    97277928    0.01
"chr8"  97644183    97730260    -0.25
"chr8"  97775787    97853013    -1.05
"chr8"  97869063    98036724    -0.36
"chr8"  98041720    98045545    -0.02
"chr8"  98064567    98093609    1.24
"chr8"  98102343    98117171    -1.38
"chr8"  98117292    98159835    0.65
"chr8"  98189825    98294235    1.04
"chr8"  98454632    98825688    0.42
"chr8"  98944441    98952100    0.09
"chr8"  99013273    99877580    1.37
"chr8"  99877864    99893707    -1.45

my .ici file (beginning didn't change)

[SNV]
file = dijen080_snv.bed
title = SNV
height = 1
display = collapsed
color = bed_rgb
border_color = bed_rgb

[mutations CTCF]
file = mutations_CTCF.bed
title = mutations CTCF
height = 1
display = collapsed
color = bed_rgb
border_color = bed_rgb

[bedgraph RNA-Seq]
file = dijarn085.bedGraph
title = RNA-seq coverage
color = green
width = 0.2
# optional. Default is yes, set to no to turn off the visualization of data range
show data range = yes
# optional, otherwise guessed from file ending
file_type = bedgraph
height = 4

[bedgraph RNA-Seq zscore]
file = TADs/zscore.bg
title = RNA-seq Zscore
color = red
width = 0.2
# optional. Default is yes, set to no to turn off the visualization of data range
show data range = yes
# optional, otherwise guessed from file ending
file_type = bedgraph
height = 4

[bigwig Chipseq]
file = CTCF_data/ENCFF757LRW.bw
color = blue
height = 4
title = CTCF
min_value = 0
max_value = 30

TAD_chromo_merge_test_CTCF thanks for your help! Marine

lldelisle commented 2 years ago

It is because of the quotes around your chromosome name.

MarineBergot commented 2 years ago

me again, i have issue with my CNV. Maybe it's not possible but in case of for future. I have different bed file : -one with CNV -one with SNV -one with CNV/SNV which occured in CTCF sequences

this is really not so good when they are in differents lines, i tried to put everything in same bed but it wasn't good solution apparently. do you think there is a way to draw with differents colors the differns types of CNV/SNV/CTCF mutations directly on the genes ? thanks! TAD_chromo_merge_test_CTCF

and maybe improve arc ?

lldelisle commented 2 years ago

I need more info to help you:

  1. Can you give me your actual ini file? And explain how did you obtain the file plotted as genes grch38?
  2. What are the arcs?
  3. Can you draw and take a picture of what it would look like ideally? For example, you want to see your genes on multiple lines or a single line?

For the overlay you are asking. There is an option: overlay_previous = yes which allow to draw one track on top of the other one, for example, TAD on top of heatmap but it can also be CNV on top of genes... I don't know if this helps you.