LTLA / InteractionSet

Clone of the Bioconductor repository for the InteractionSet package, see https://bioconductor.org/packages/devel/bioc/html/InteractionSet.html for the official development version.
2 stars 0 forks source link

NAs produced by integer overflow in inflate GInteractions #9

Closed jdmontenegro closed 1 year ago

jdmontenegro commented 1 year ago

Hi, I am trying to build a sparse HiC contactMAtrix from a GenomicInteractions object. In total there are 20.2 M interaction pairs and 50.5K regions (5K windows) in my genome.

### read a file with all the bins in the genome (chr1\tstart1\tend1)
> allRegionsDF<-read_tsv(binsFile, col_names=F)
> allRegionsGR<-makeGRangesFromDataFrame(
+   allRegionsDF,
+   seqnames.field="X1",
+   start.field="X2",
+   end.field="X3"
+)

### read file interaction matrix with the format ("bin1\tbin2\tcount")
# bin1 and bin2 would be the indices of the region in the allRegionsGR object
> hicDF <- read_tsv(hicFile, col_names=F)
> names(hicDF) <- c("xIdx", "yIdx", "counts")

### create interactions object and contact matrix
> hicGI<-GInteractions(hicDF$xIdx, hicDF$yIdx, allRegionsGR)
> hicCM<-inflate(hicGI, hicDF$xIdx, hicDF$yIdx,, hicDF$counts)

rm(Error in out.mat[(ac2[relevantA] - 1L) * nR + ar1[relevantA]] <- fill[relevantA] : 
  NAs are not allowed in subscripted assignments
In addition: Warning message:
In (ac2[relevantA] - 1L) * nR : NAs produced by integer overflow

I went into the code and saw that the error appears when (!sparse), so I modify the command:

> hicCM<-inflate(hicGI, hicDF$xIdx, hicDF$yIdx,, hicDF$counts, sparse=TRUE)
Error in if (swap) { : the condition has length > 1

but I get the error that "swap" has more than one value?, so I define swap in the next command:

> hicCM<-inflate(hicGI, hicDF$xIdx, hicDF$yIdx,, hicDF$counts, sparse=TRUE, swap=TRUE)
Error in if (any(i < 0L)) { : missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In (ac1[relevantB] - 1L) * nR : NAs produced by integer overflow
2: In (ac1[relevantB] - 1L) * nR + ar2[relevantB] :
  NAs produced by integer overflow

so now I get the same error as before, but for the reciprocal values. My guess is that there should be a version of swap for "sparse=TRUE", because currently swap has the command as when "sparse=FALSE". Is this the correct interpretation? Or am I missing something in the construction of the contactMatrix?

Thanks in advance.

Juan D. Montenegro

LTLA commented 1 year ago

Thanks - this should be fixed in the latest version once it builds on the BioC build machines (1.26.1), but you can also install the GitHub version directly if you want it sooner.