HYsxe / PRINT

32 stars 3 forks source link

Issue with getPrecomputedBias() #8

Closed maxdudek closed 1 year ago

maxdudek commented 1 year ago

Hi,

I have a couple issues with the getPrecomputedBias() function.

First, it's a bit confusing that the path to the h5 bias file is hard-coded, I had to edit getBias.R to make it work for my file structure. I would prefer if the path to the bias file could be provided as an argument to the function.

Second, I get an error when I run the function. I see the progress bar, and it goes up normally until it reaches 100% when it spits out this:

Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'as.matrix': Item 639 has 54 columns, inconsistent with item 1 which has 100 columns. To fill missing columns use fill=TRUE.
7. h(simpleError(msg, call))
6. .handleSimpleError(function (cond)
.Internal(C_tryCatchHelper(addr, 1L, cond)), "Item 639 has 54 columns, inconsistent with item 1 which has 100 columns. To fill missing columns use fill=TRUE.", base::quote(data.table::rbindlist(refBias)))
5. data.table::rbindlist(refBias) at getBias.R#143
4. as.matrix(data.table::rbindlist(refBias)) at getBias.R#143
3. .local(project, nCores, chunkSize, ...)
2. getPrecomputedBias(project, nCores = 1) at getBias.R#88
1. getPrecomputedBias(project, nCores = 1)

For reference, here is my code:

source("/home/dudekmf/local/src/PRINT/code/utils.R")
source("/home/dudekmf/local/src/PRINT/code/getCounts.R")
source("/home/dudekmf/local/src/PRINT/code/getBias.R")
source("/home/dudekmf/local/src/PRINT/code/getFootprints.R")
source("/home/dudekmf/local/src/PRINT/code/getSubstructures.R")
source("/home/dudekmf/local/src/PRINT/code/visualization.R")
source("/home/dudekmf/local/src/PRINT/code/getGroupData.R")
source("/home/dudekmf/local/src/PRINT/code/footprintTracking.R")
source("/home/dudekmf/local/src/PRINT/code/getTFBS.R")

projectName <- "HepG2"
project <- footprintingProject(projectName = projectName,
                               refGenome = "hg38")
projectMainDir <- "./"
projectDataDir <- paste0(projectMainDir, "data/", projectName, "/")
dataDir(project) <- projectDataDir
mainDir(project) <- projectMainDir

regionsBed <- read.table(paste0(projectDataDir, "HepG2.optimal.bed"))
colnames(regionsBed) <- c("chr", "start", "end", "peak_id")
regions <- GRanges(seqnames = regionsBed$chr,
ranges = IRanges(start = regionsBed$start, end = regionsBed$end))
regionRanges(project) <- regions

if(file.exists(paste0(projectDataDir, "predBias.rds"))){
  regionBias(project) <- readRDS(paste0(projectDataDir, "predBias.rds"))
} else {
  project <- getPrecomputedBias(project, nCores = 1)
  saveRDS(regionBias(project), paste0(projectDataDir, "predBias.rds"))
}

Any insights into this error? Thanks! Max

HYsxe commented 1 year ago

Hi Max!

Good suggestion and I'll modify the function so the user can provide the path.

Regarding the error. Could you also send me your HepG2.optimal.bed file? (You can email me or via some secured way) My guess is that the function stores the bias in the form of a region-by-position bias. For example if you have 200k regions and each one is 1kb then it's stored as a 200k-by-1000 matrix. This means that we require all regions are resized to the same width. But I guess it would be easier to troubleshoot if I have the file and see whether I can reproduce the error.

Thank you!

Yan

maxdudek commented 1 year ago

Thanks for the quick reply!

That must be it, my regions are definitely not the same width. But now I'm confused about why that is a requirement. I thought that "regions" represented regions of open chromatin, which can have variable length? HepG2.optimal.bed is a file of ATAC-seq peaks called from my data, so how would you recommend I convert this to a list of regions with equal width? Should I just resize each peak to be 1 kb around the midpoint?

Thanks again for your help!

HYsxe commented 1 year ago

Yes! What we usually do is resizing the ATAC-seq peak ranges to a fixed size such as 1kb around the midpiont. This is mainly for convenience of storing the precomputed bias (so that we can store them as a peak-by-position matrix). I think theoretically we could have also stored things in a list but a matrix makes things easier when you want to slice by row and column indices.

maxdudek commented 1 year ago

Thank you, I did that and it works now!