PoisonAlien / trackplot

Generate IGV style locus tracks from bigWig files in R
125 stars 15 forks source link

Problem - Genomic Region Viewing Window #33

Open BolisLab opened 3 months ago

BolisLab commented 3 months ago

Hi, thanks in advance for the help. I tried plotting some bigwig files, successfully creating a track plot with a not very wide viewing window:

> loci_names
         SYMBOL                      LOCI
ANKRD11 ANKRD11   chr16:89007630-89650561`

**Indeed, the code works:**

> t = track_extract(colData = sample_bigWigs_list[[j]], loci = loci_names$LOCI[i])
Parsing loci..
    Queried region: chr16:89007630-89650561 [642931 bps]
Querying UCSC genome browser for gene model and cytoband..
Extracting gene models from UCSC:
    chromosome: chr16
    build: hg38
    query: mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD hg38 -e 'select chrom, txStart, txEnd, strand, name, name2, exonStarts, exonEnds from refGene WHERE chrom ="chr16"'
Extracting cytobands from UCSC:
    chromosome: chr16
    build: hg38
    query: mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD hg38 -e 'select chrom, chromStart, chromEnd, name, gieStain from cytoBand WHERE chrom ="chr16"'
Generating windows [10 bp window size]
Extracting signals
    Processing iPSC_1_MNCdLS1 ..
    Processing iPSC_2_G12 ..
    Processing iPSC_3_MNFa1 ..
    Processing iPSC_4_MNMo1 ..
OK!

When I try to widen the viewing window:

> loci_names
         SYMBOL                      LOCI
ANKRD11 ANKRD11   chr16:88667630-89990561

 #get the following error:

t = track_extract(colData = sample_bigWigs_list[[j]], loci = loci_names$LOCI[i])
Parsing loci..
    Queried region: chr16:88667630-89990561 [1322931 bps]
Querying UCSC genome browser for gene model and cytoband..
Extracting gene models from UCSC:
    chromosome: chr16
    build: hg38
    query: mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD hg38 -e 'select chrom, txStart, txEnd, strand, name, name2, exonStarts, exonEnds from refGene WHERE chrom ="chr16"'
Extracting cytobands from UCSC:
    chromosome: chr16
    build: hg38
    query: mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD hg38 -e 'select chrom, chromStart, chromEnd, name, gieStain from cytoBand WHERE chrom ="chr16"'
Generating windows [10 bp window size]
Extracting signals
    Processing iPSC_1_MNCdLS1 ..
invalid unsigned integer: "8.9e+07"
    Processing iPSC_2_G12 ..
invalid unsigned integer: "8.9e+07"
    Processing iPSC_3_MNFa1 ..
invalid unsigned integer: "8.9e+07"
    Processing iPSC_4_MNMo1 ..
invalid unsigned integer: "8.9e+07"
Error: Can't assign 1 names to a 0-column data.table

I have checked that the selected regions do not exceed the chromosome sizes, and there is no such issue. What could it be due to? Can it be resolved?

PoisonAlien commented 3 months ago

Hi,

Thanks for the issue. The problem here is that R tends to represent integers in scientific notation after a certain length. You can increase this threshold with options before running the trackplot commands.

options(scipen = 15L)
t = track_extract(colData = sample_bigWigs_list[[j]], loci = loci_names$LOCI[i])

I hope this helps.

BolisLab commented 3 months ago

Hi,

Thank you very much, now with the setting " options(scipen = 15L) " it works.

I wanted to ask you one more small thing. Although now with this setting the following step is always passed without errors:

" t = track_extract(colData = sample_bigWigs_list[[j]], loci = loci_names$LOCI[i]) "

for some genomic regions, there is another error when performing the " track_plot " (not for all, it works for some, but not for others). Below, I am sending the code related to the region that does not work:

Extract bigWig signal for a loci of interest > options(scipen = 15L) > t = track_extract(colData = sample_bigWigs_list[[j]], loci = loci_names$LOCI[i]) Parsing loci..

Queried region: chrX:70801691-74073101 [3271410 bps] Querying UCSC genome browser for gene model and cytoband.. Extracting gene models from UCSC: chromosome: chrX build: hg38 query: mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD hg38 -e 'select chrom, txStart, txEnd, strand, name, name2, exonStarts, exonEnds from refGene WHERE chrom ="chrX"' Extracting cytobands from UCSC: chromosome: chrX build: hg38 query: mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD hg38 -e 'select chrom, chromStart, chromEnd, name, gieStain from cytoBand WHERE chrom ="chrX"' Generating windows [10 bp window size] Extracting signals Processing iPSC_1_MNCdLS1 .. Processing iPSC_2_G12 .. Processing iPSC_3_MNFa1 .. Processing iPSC_4_MNMo1 .. OK! > > # Estrapolazione Peaks Names > bw_sample_names <- sample_bigWigs_list[[j]]$bw_sample_names > peaksnames <- sub("^iPSC([0-9]+[^]+)_.$", "\1", bw_sample_names) > peaks_names <- paste0(peaks_names, "_peaks") > > # Estrapolazione Tracks Names > bw_sample_names <- sample_bigWigs_list[[j]]$bw_sample_names > tracknames <- sub("^iPSC([0-9]+[^]+)_.$", "\1", bw_sample_names) > track_names <- paste0(track_names, "_track") > > # Impostare un margine superiore più grande del plot > par(oma = c(2, 2, 2, 2), cex.main = 1) > > # TrackPlot Execution: > trackplot::track_plot(summary_list = t, col = track_cols, show_ideogram = TRUE, y_min = 0, y_max = 250, gene_fsize = 0.6, left_mar = 14, track_names = track_names, track_names_to_left = T, + peaks = sample_NarrowPeak_list[[j]], peaks_track_names = peaks_names, layout_ord = c("c", "g", "p", "b", "h"), gene_track_height = 8, peaks_track_height = 5) Collapsing transcripts.. Error in if (attr(txtbl, "strand") == "+") { : the condition has length > 1

In particular, the genomic region referred to is the following:

loci_names SYMBOL LOCI HDAC8 HDAC8 chrX:70801691-74073101

What could this be due to?

Thank you very much in advance.

Luca Guarrera

Da: "Anand Mayakonda" @.> A: "PoisonAlien" @.> Cc: "BolisLab" @.>, "Author" @.> Inviato: Lunedì, 8 aprile 2024 9:33:52 Oggetto: Re: [PoisonAlien/trackplot] Problem - Genomic Region Viewing Window (Issue #33)

Hi,

Thanks for the issue. The problem here is that R tends to represent integers in scientific notation after a certain length. You can increase this threshold with options before running the trackplot commands. options( scipen = 15L ) t = track_extract( colData = sample_bigWigs_list [[ j ]], loci = loci_names $ LOCI [ i ])

I hope this helps.

— Reply to this email directly, [ https://urlsand.esvalabs.com/?u=https%3A%2F%2Fgithub.com%2FPoisonAlien%2Ftrackplot%2Fissues%2F33%23issuecomment-2042054910&e=d1becced&h=18d8bc73&f=y&p=y | view it on GitHub ] , or [ https://urlsand.esvalabs.com/?u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FBDWEELL5ECUAABA2ANMT2QDY4JB6BAVCNFSM6AAAAABFZJ4JJOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSGA2TIOJRGA&e=d1becced&h=f8fe2497&f=y&p=y | unsubscribe ] . You are receiving this because you authored the thread. Message ID: @.***>


L'Istituto Mario Negri IRCCS con tutti i suoi ricercatori ringrazia chi ha generosamente sostenuto la sua attivita'. Continua a sostenerci per mantenere indipendente la nostra ricerca scientifica.

Clicca qui per scoprire come sostenerci:

https://www.marionegri.it/aiuta-la-ricerca

BolisLab commented 3 months ago

Hi,

Thank you very much, now with the setting options(scipen = 15L) it works.

I wanted to ask you one more small thing. Although now with this setting the following step is always passed without errors:

t = track_extract(colData = sample_bigWigs_list[[j]], loci = loci_names$LOCI[i])

for some genomic regions, there is another error when performing the "_trackplot" (not for all, it works for some, but not for others). Below, I am sending the code related to the region that does not work:

> # Extract bigWig signal for a loci of interest
> options(scipen = 15L)
> t = track_extract(colData = sample_bigWigs_list[[j]], loci = loci_names$LOCI[i])
Parsing loci..
    Queried region: chrX:70801691-74073101 [3271410 bps]
Querying UCSC genome browser for gene model and cytoband..
Extracting gene models from UCSC:
    chromosome: chrX
    build: hg38
    query: mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD hg38 -e 'select chrom, txStart, txEnd, strand, name, name2, exonStarts, exonEnds from refGene WHERE chrom ="chrX"'
Extracting cytobands from UCSC:
    chromosome: chrX
    build: hg38
    query: mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD hg38 -e 'select chrom, chromStart, chromEnd, name, gieStain from cytoBand WHERE chrom ="chrX"'
Generating windows [10 bp window size]
Extracting signals
    Processing iPSC_1_MNCdLS1 ..
    Processing iPSC_2_G12 ..
    Processing iPSC_3_MNFa1 ..
    Processing iPSC_4_MNMo1 ..
OK!
> 
> # Estrapolazione Peaks Names
> bw_sample_names <- sample_bigWigs_list[[j]]$bw_sample_names
> peaks_names <- sub("^iPSC_([0-9]+_[^_]+)_.*$", "\\1", bw_sample_names)
> peaks_names <- paste0(peaks_names, "_peaks")
> 
> # Estrapolazione Tracks Names
> bw_sample_names <- sample_bigWigs_list[[j]]$bw_sample_names
> track_names <- sub("^iPSC_([0-9]+_[^_]+)_.*$", "\\1", bw_sample_names)
> track_names <- paste0(track_names, "_track")
> 
> # Impostare un margine superiore più grande del plot
> par(oma = c(2, 2, 2, 2), cex.main = 1) 
> 
> # TrackPlot Execution:
> trackplot::track_plot(summary_list = t, col = track_cols, show_ideogram = TRUE, y_min = 0, y_max = 250, gene_fsize = 0.6, left_mar = 14, track_names = track_names, track_names_to_left = T, peaks = sample_NarrowPeak_list[[j]], peaks_track_names = peaks_names, layout_ord = c("c", "g", "p", "b", "h"), gene_track_height = 8, peaks_track_height = 5)
Collapsing transcripts..
Error in if (attr(txtbl, "strand") == "+") { : 
  the condition has length > 1

In particular, the genomic region referred to is the following:

> loci_names
      SYMBOL                   LOCI
HDAC8  HDAC8 chrX:70801691-74073101

What could this be due to?

Thank you very much in advance.