JSB-UCLA / Clipper

A p-value-free method for controlling false discovery rates in high-throughput biological data with two conditions
38 stars 5 forks source link

matrix(s1, ncol = 1) : data is too long #4

Open zhangpicb opened 2 years ago

zhangpicb commented 2 years ago

Hi @xcggates

Thanks for your beautiful code!

when I use MACS2 and Clipper to call peak, I follow the steps described in vignettes/Clipper.Rmd.

Mouse TF ChIP-seq data was input data,and this step

re <- Clipper(score.exp = matrix(s1, ncol = 1), 
              score.back = matrix(s2, ncol = 1), 
              analysis = "enrichment")
Error in matrix(s1, ncol = 1) : data is too long

It would be very appreciated if you could give me some help

xcggates commented 2 years ago

Hello, This error may occur when the length of s1 and s2 is too large. Could you please check the length of these two arrays? Or you can try: re <- Clipper(score.exp = s1, score.back = s2, analysis = "enrichment") If it does not work, I will update the package and add a new function for this situation

zhangpicb commented 2 years ago

Hi @xcggates

Thanks for your quick reply!

> length(s1)
[1] 2726191108
> length(s2)
[1] 2725269014
> re <- Clipper(score.exp = s1, score.back = s2, analysis = "enrichment")
Error in matrix(score.exp, ncol = 1) : data is too long

I think the length of s1 is the length of mouse genome,the length of s2 also is the length of mouse genome,why they are larger than mouse genome length and why they are difference ?

MACS2 callpeak codes are below

 macs2 callpeak -t exp.bam \
                 -c back.bam \
                -f BAMPE \
                --keep-dup all \
                -g mm \
                -q 1 \
                -B \
                -n twosample 2>twosampleq1.log

 #exp
 macs2 callpeak -t exp.bam \
                -f BAMPE \
                --keep-dup all \
                -g mm \
                -q 1 \
                -B \
                -n exp 2>exp.log

 #back              
 macs2 callpeak -t back.bam \
                -f BAMPE \
                --keep-dup all \
                -g mm \
                -q 1 \
                -B \
                -n back 2>back.log

It would be very appreciated if you could give me some help

xcggates commented 2 years ago

Hello,

The length of s1 is the length of the experimental genome, and the length of s2 is the length of the background/negative control genome. It happens that the two genomes have different length. In our vignette, we uses1[(length(s1)+1):length(s2)] <- 0 to make sure that s1 and s2 share the same length. The current error data is too long happens when the length of the array is too large. We are currently looking into making Clipper a direct add-on of MACS3, to make it more convenient to use. We will let you know once we have some updates.