TaoYang-dev / hicrep

R package to evaluate the reproducibility of Hi-C data
25 stars 4 forks source link

Error in smt_R2[i + dist, i] : subscript out of bounds #70

Open LHXqwq opened 2 years ago

LHXqwq commented 2 years ago

mat1 <- hic2mat(a, chromosome1 = i, chromosome2 = i, resol = 100000, method = "NONE") mat2 <- hic2mat(b, chromosome1 = i, chromosome2 = i, resol = 100000, method = "NONE") scc.out = get.scc(mat1, mat2, 100000, 5, lbr = 0, ubr = 5000000) Error in smt_R2[i + dist, i] : subscript out of bounds

claudiarabu commented 1 year ago

I also encountered such error but at 10k resolution. I saw that one of my matrices had 1 row and 1 column less than the other (strangely enough, both matrices were generated with the same pipeline and mapping on the same genome. And this did not happened with the same matrices at higher resolutions). While I am trying to understand better the error, I did obtain an actual value and no error when inverting the matrices in the get.scc function. In your example: scc.out = get.scc(mat2, mat1, 100000, 5, lbr = 0, ubr = 5000000)

In principle the scc values of mat1/mat2 and mat2/mat1 should be identical, but I'll have to double check.

This is certainly not the final solution, so I guess we should wait for a feedback.

zzhoujialu commented 1 year ago

Hi, I met the same error and still have no idea why it happens. Have you worked it out?

Looking forward to hearing from you! Thank you.

kalo

claudiarabu commented 1 year ago

I did not find a solution yet. The problem seems to happen during the conversion with the bed2mat format in my case. Somehow, the resulting mat files has 1 col less and 1 row less than expected for some samples, but not for others.

Claudia

hisakatha commented 1 year ago

I met a similar error because of #72. If you use cool2matrix and specify a string chromosome name as the chr argument of the function, the chromosome name is converted into the index of the human genome. In my case, this conversion caused that a contact matrix was created for an unintended chromosome, which was shorter than ubr = 5000000 and caused an error: Error in smt_R1[i + dist, i] : subscript out of bounds.

DavidWarrenKatz commented 1 year ago

My matrices are the same size and I am receiving this error as well. Strangely, if I set the resolution in get.scc to 13000, it works, although my hic data has a resolution of 5000.

ccmeth commented 1 year ago

I also encountered such error but at 10k resolution. I saw that one of my matrices had 1 row and 1 column less than the other (strangely enough, both matrices were generated with the same pipeline and mapping on the same genome. And this did not happened with the same matrices at higher resolutions). While I am trying to understand better the error, I did obtain an actual value and no error when inverting the matrices in the get.scc function. In your example: scc.out = get.scc(mat2, mat1, 100000, 5, lbr = 0, ubr = 5000000)

In principle the scc values of mat1/mat2 and mat2/mat1 should be identical, but I'll have to double check.

This is certainly not the final solution, so I guess we should wait for a feedback.

Your suggestion worked well for me and the scc values are indeed identical! But such exchange is not proper for (for loop).........

mdozmorov commented 11 months ago

Same error. Following @claudiarabu workaround, here is the check in the for loop to avoid it.

library(hicrep)
library(readr)
# Sample IDs
array <- c("Sample1", "Sample2)
# Chromosomes
chromosomes <- c(paste0("chr", 1:22))
# Resolution
res <- 25000
# Path with data
dir_data <- "/path/to/data"
# File name to save the results
fileNameOut1 <- file.path(dir_data, paste0("run_HiCrep_", res, ".csv"))
# Empty matrix to store HiCrep results for each sample. Filled with 1s initially
mtx_corr <- matrix(data = 1, nrow = length(array), ncol = length(array))
# Process each pair of samples
for (i in 1:length(array)) {
    for (j in 1:length(array)) {
      # Need to process one index pair, the reverse comparison is identical
      if (i > j) {
        print(paste(i, j))
        fileNameIn1 <- file.path(dir_data, array[i], "data/aligned/inter_30.hic")
        fileNameIn2 <- file.path(dir_data, array[j], "data/aligned/inter_30.hic")
        # Empty vector to store chromosome-specific stratified correlation coefficients
        chromosomes_corr <- vector(mode = "numeric", length = length(chromosomes))
        for (k in 1:length(chromosomes)) {
          mat1.chr <- hic2mat(fileNameIn1, chromosome1 = chromosomes[k], chromosome2 = chromosomes[k], resol = res, method = "NONE") 
          mat2.chr <- hic2mat(fileNameIn2, chromosome1 = chromosomes[k], chromosome2 = chromosomes[k], resol = res, method = "NONE") 
          # Depending on matrix dimensions, compare them in different order. Workaround of https://github.com/TaoYang-dev/hicrep/issues/70
          if (nrow(mat1.chr) > nrow(mat2.chr)) {
            scc.out = get.scc(mat2.chr, mat1.chr, resol = res, h = 5, lbr = 0, ubr = 5000000)
          } else {
            scc.out = get.scc(mat1.chr, mat2.chr, resol = res, h = 5, lbr = 0, ubr = 5000000)
          }
          # Collect chromosome-specific SCC
          chromosomes_corr[k] <- scc.out$scc
        }
        # Average chromosome-specific SCCs for a sample. Store symmetrically
        mtx_corr[i, j] <- mtx_corr[j, i] <- mean(chromosomes_corr)
      }
    }
}
# Add sample names as rows and columns
colnames(mtx_corr) <- rownames(mtx_corr) <- array
# Save the data
write_csv(as.data.frame(mtx_corr), fileNameOut1)