dozmorovlab / HiCcompare

Joint normalization of two Hi-C matrices, visualization and detection of differential chromatin interactions. See multiHiCcompare for the analysis of multiple Hi-C matrices
https://dozmorovlab.github.io/HiCcompare/
Other
19 stars 3 forks source link

Saving results #8

Closed Phlya closed 6 years ago

Phlya commented 6 years ago

Hi, how can I actually save the results in the end? I never use R, so this might be a stupid question...

I am trying to apply HiCcompare genome-wide and therefore have a list of tables. I thought perhaps I can just rbindlist it all together and write.table, but this doesn't work: Item 1 of list input is not a data.frame, data.table or list

jstansfield0 commented 6 years ago

You can save an R object by using the save() function like this: save(hic.table, file = 'path/to/file.RDA') And then load the object with load(file = 'path/to/file.RDA). Or you should be able to write it as a text file using write.table(hic.table, file = "path/to/file.txt"). If write.table is not working can you print the top few rows of the object you are trying to save and paste them here so I can see?

mdozmorov commented 6 years ago

The following should work: write.csv(do.call(rbind, hic.table), "hic_table.csv")

Phlya commented 6 years ago

Hi, thanks for your comments! I feel like I am really not getting something... This is my script:

#!/usr/bin/env Rscript
library(HiCcompare)

args = commandArgs(trailingOnly=TRUE)

dat1 <- read.table(args[1], header=FALSE, col.names=c("chr1", "start1", "end1", "chr2", "start2", "end2", "IF"))
dat2 <- read.table(args[2], header=FALSE, col.names=c("chr1", "start1", "end1", "chr2", "start2", "end2", "IF"))

hic.list <- create.hic.table(dat1, dat2)
hic.list <- total_sum(hic.list)
hic.list <- hic_loess(hic.list, Plot=TRUE, Plot.smooth=FALSE, parallel=TRUE)
hic.list <- hic_compare(hic.list, A.min = NA, adjust.dist = TRUE, p.method = 'fdr', Plot = TRUE)
head(hic.list)
write.csv(do.call(rbind, hic.list), args[3])

I run it on a cluster like this:

#!/bin/sh
# Grid Engine options (lines prefixed with #$)
#$ -cwd
#$ -N HiCCompare
#$ -l h_rt=06:00:00
#$ -l h_vmem=32G
#$ -pe sharedmem 4
#$ -V

Rscript 03_HiCcompare.R $1 $2 $3

qsub 03_launch_HiCcompare.sh ../pixels/serum-KM.1000000.pixels.tsv ../pixels/2i.1000000.pixels.tsv ../pixels/serum_vs_2i_hiccompare_1Mb.tsv

This is the output I get from stderr:

Loading required package: dplyr

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Warning messages:
1: In sum(hic.table$IF2) : integer overflow - use sum(as.numeric(.))
2: In sum(hic.table$IF1) : integer overflow - use sum(as.numeric(.))
Error in data.table::rbindlist(hic.list) : 
  Item 1 of list input is not a data.frame, data.table or list
Calls: total_sum -> <Anonymous>
Execution halted

What am I doing wrong?..

jstansfield0 commented 6 years ago

Are the sparse matrices that you are reading in for a single chromosome or did you combine all the chromosomes into a single file? HiCcompare needs the input to be a separate matrix for each chromosome, i.e if you are trying to compare the entire human genome you shave have 23 separate objects for each group. I think from your code you are only comparing a single pair of matrices so you should not be using the total_sum function. Additionally there is no need to rbindlist if you're only using data for a single chromosome. If you do want to look at all the chromosomes your script should look more like this:

# read your data, assuming the paths to the files are stored in group1_filenames and group2_filenames
group1 <- list()
group2 <- list()
for (i in 1:23) {
  group1[[i]] <- read.table(group1_filenames[i], header=FALSE, col.names=c("chr1", "start1", "end1", "chr2", "start2", "end2", "IF"))
  group2[[i]] <- read.table(group2_filenames[i], header=FALSE, col.names=c("chr1", "start1", "end1", "chr2", "start2", "end2", "IF"))
}

# now create the hic.table objects
hic.list <- mapply(create.hic.table, group1, group2, SIMPLIFY = FALSE)

# set up parallelization
library(BiocParallel)
# Check how many cores you have
numCores <- parallel::detectCores()
numCores
# Set the number of cores at least one less than the total number
register(MulticoreParam(workers = numCores - 1), default = TRUE)

# normalize
hic.list <- total_sum(hic.list)
hic.list <- hic_loess(hic.list, parallel=TRUE)

# compare
hic.list <- hic_compare(hic.list, A.min = NA, adjust.dist = TRUE, p.method = 'fdr', Plot = TRUE, parallel = TRUE)

# save
write.csv(rbindlist(hic.list), file = "path/to/file.csv")

There are more examples in the vignette which you can access using browseVignettes("HiCcompare") .

Phlya commented 6 years ago

It is whole genome data. I thought I read somewhere that create.hic.table automatically creates a list of tables if input has multiple chromosomes?..

jstansfield0 commented 6 years ago

That is the problem then. Currently create.hic.table doesn't split it up into a list if you input multiple chromosomes. You can work around this by just splitting your data up by chromosome before inputting it into create.hic.table.

# split the data by chromosome
dat1 <- split(dat1, dat1$chr1)
dat2 <- split(dat2, dat2$chr1)

# now use mapply to create a list of hic.tables
hic.list <- mapply(create.hic.table, dat1, dat2, SIMPLIFY = FALSE)
Phlya commented 6 years ago

Brilliant, thanks! And is it OK if the data contain inter-chromosomal interactions?

jstansfield0 commented 6 years ago

No, HiCcompare only works on the intra-chromosomal interactions. You should remove any rows of your data where chr1 != chr2 before inputting it into HiCcompare.

Phlya commented 6 years ago

Thanks a lot, it's working now!

For future reference, this is my script:

library(HiCcompare)
library(BiocParallel)

args = commandArgs(trailingOnly=TRUE)

dat1 <- read.table(args[1], header=FALSE, col.names=c("chr1", "start1", "end1", "chr2", "start2", "end2", "IF"))
dat2 <- read.table(args[2], header=FALSE, col.names=c("chr1", "start1", "end1", "chr2", "start2", "end2", "IF"))

dat1 <- dat1[dat1$chr1==dat1$chr2, ]
dat2 <- dat2[dat2$chr1==dat2$chr2, ]

dat1 <- split(dat1, dat1$chr1)
dat2 <- split(dat2, dat2$chr1)

hic.list <- mapply(create.hic.table, dat1, dat2, SIMPLIFY = FALSE, scale=FALSE)

hic.list <- total_sum(hic.list)

register(MulticoreParam(workers = 3), default = TRUE)

hic.list <- hic_loess(hic.list, Plot=TRUE, Plot.smooth=FALSE, parallel=TRUE)
hic.list <- hic_compare(hic.list, A.min = NA, adjust.dist = TRUE, p.method = 'fdr', Plot = TRUE, parallel=TRUE)

hic.list <- do.call(rbind, hic.list)

hic.list <- hic.list[hic.list$p.adj<0.05,]

write.table(hic.list, args[3])
mdozmorov commented 6 years ago

Thanks, @Phlya, glad it worked. @jstansfield0, please, add the script to the vignette as an example how to run HiCcompare on a cluster, acknowledge Ilya. Then, close the issue. Thanks!