chrisamiller / copyCat

a parallel R package for detecting copy-number alterations from short sequencing reads
Other
22 stars 10 forks source link

copyCat takes very long time #1

Closed zhaoming159753 closed 7 years ago

zhaoming159753 commented 8 years ago

Hi Chrisamiller, I am using copyCat for somatic CNV analysis in 30X human genome. The read length is 150bp. Firstly, I use readDepth to get the mapability and gc-content information of every chromosome for 150bp reads. Secondly, bam-window was run on the normal and tumor bam files to get window files.(bam-window -l -r -w 1000) Then, got mpileup 10col files by samtools-0.1.16(the newer version only support output for 6 col mpileup files) Finally, Run copyCat as the guidance with maxCores=8 in one node.

But, it takes very long time(4 days) and don't finish now. Is this normal for a 30X human genome? How could I speed up this analysis?

here is the output log of copyCat:

R version 3.1.3 (2015-03-09) -- "Smooth Sidewalk"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

> library(copyCat)
Loading required package: foreach
Loading required package: doMC
Loading required package: iterators
Loading required package: parallel
Loading required package: IRanges
Loading required package: BiocGenerics

Attaching package: ?.iocGenerics?

The following objects are masked from ?.ackage:parallel?.

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from ?.ackage:stats?.

    xtabs

The following objects are masked from ?.ackage:base?.

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    do.call, duplicated, eval, evalq, Filter, Find, get, intersect,
    is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax,
    pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rep.int,
    rownames, sapply, setdiff, sort, table, tapply, union, unique,
    unlist, unsplit

Loading required package: S4Vectors
Loading required package: stats4
Loading required package: DNAcopy
Loading required package: stringr
Using copyCat version 1.6.9
> #The most convenient way to run copyCat is through the functions in meta.R. 
> #For a paired tumor/normal sample, this looks something like this:
> runPairedSampleAnalysis(annotationDirectory="/lustre/home/medzm/annotation/copyCat_ann/hg19/copycat.anno.hg19",
+                         outputDirectory="FHM-RL_CNV_out",
+                         normal="WGC037211D_mem.grp.1000.windfile",
+                         tumor="WGC037811D_mem.grp.1000.windfile",
+                         inputType="bins",
+                         maxCores=8,
+                         binSize=0, #infer automatically from bam-window output
+                         perLibrary=1, #correct each library independently
+                         perReadLength=1, #correct each read-length independently
+                         verbose=TRUE,
+                         minWidth=3, #minimum number of consecutive winds need to call CN
+                         minMapability=0.6, #a good default
+                         dumpBins=TRUE,
+                         doGcCorrection=TRUE,
+                         samtoolsFileFormat="10colPileup", #will infer automatically - mpileup 10col or VCF
+                         purity=1,
+                         normalSamtoolsFile="WGC037211D_mem.grp.10col.mpileup",
+                         tumorSamtoolsFile="WGC037811D_mem.grp.10col.mpileup")
[1] "inferred bin size:  1000"
[1] "calculating mapability content for read length 150 Sat Jul 16 02:52:09 2016"
correcting for GC bias Sat Jul 16 02:52:20 2016 
[1] "calculating GC content for read length 150 Sat Jul 16 02:52:20 2016"
Correcting read depth for GC-content bias:   Sat Jul 16 02:52:33 2016 
[1] "correcting library 1 (rd.WGC037211D.150)"

Is there any problem? Thanks very much.

chrisamiller commented 8 years ago

4 days is certainly unexpected. Using 8 cores and 1k windows, as you are, I'd expect it to be done in closer to 8 hours.

Perhaps something is wrong with the annotation files? Try downloading my version of the 150bp read annotations, which I just added here: https://xfer.genome.wustl.edu/gxfer1/project/cancer-genomics/copyCat/

zhaoming159753 commented 8 years ago

Thanks for your reply, I just can not find the annotation file for 150bp read length. Maybe I should wait for a monent.

2016-07-18 11:07 GMT+08:00 Chris Miller notifications@github.com:

4 days is certainly unexpected. Using 8 cores and 1k windows, as you are, I'd expect it to be done in closer to 8 hours.

Perhaps something is wrong with the annotation files? Try downloading my version of the 150bp read annotations, which I just added here: https://xfer.genome.wustl.edu/gxfer1/project/cancer-genomics/copyCat/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chrisamiller/copyCat/issues/1#issuecomment-233221753, or mute the thread https://github.com/notifications/unsubscribe-auth/ANWRw_CUOSs1ViGZktLGrdTrJenMQKbJks5qWu3XgaJpZM4JOYaB .

Ming ZhaoShanghai Institute of Hematology & State Key Laboratory of Medical Genomics, Rui Jin Hospital, Shanghai Jiao Tong University School of Medicine (SJTU-SM)

Tel: +8613825071852 Email: zhaoming159753@gmail.com zhaoming159753@gmail.comRui Jin Hospital, Shanghai Jiao Tong University School of Medicine, Building 11, Room 1109, 197 Ruijin Er Rd, Shanghai 200025, P.R.China

chrisamiller commented 8 years ago

Try doing a shift-refresh on the page - it may be cached for you.

On Jul 17, 2016 11:24 PM, "zhaoming159753" notifications@github.com wrote:

Thanks for your reply, I just can not find the annotation file for 150bp read length. Maybe I should wait for a monent.

2016-07-18 11:07 GMT+08:00 Chris Miller notifications@github.com:

4 days is certainly unexpected. Using 8 cores and 1k windows, as you are, I'd expect it to be done in closer to 8 hours.

Perhaps something is wrong with the annotation files? Try downloading my version of the 150bp read annotations, which I just added here: https://xfer.genome.wustl.edu/gxfer1/project/cancer-genomics/copyCat/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/chrisamiller/copyCat/issues/1#issuecomment-233221753 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ANWRw_CUOSs1ViGZktLGrdTrJenMQKbJks5qWu3XgaJpZM4JOYaB

.

Ming ZhaoShanghai Institute of Hematology & State Key Laboratory of Medical Genomics, Rui Jin Hospital, Shanghai Jiao Tong University School of Medicine (SJTU-SM)

Tel: +8613825071852 Email: zhaoming159753@gmail.com zhaoming159753@gmail.comRui Jin Hospital, Shanghai Jiao Tong University School of Medicine, Building 11, Room 1109, 197 Ruijin Er Rd, Shanghai 200025, P.R.China

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/chrisamiller/copyCat/issues/1#issuecomment-233226613, or mute the thread https://github.com/notifications/unsubscribe-auth/AAC8CDllOPIi1X70U6-3BV2IMV8B3IYtks5qWwADgaJpZM4JOYaB .

zhaoming159753 commented 8 years ago

I got it and will try again, thanks very much.

2016-07-18 12:26 GMT+08:00 Chris Miller notifications@github.com:

Try doing a shift-refresh on the page - it may be cached for you.

On Jul 17, 2016 11:24 PM, "zhaoming159753" notifications@github.com wrote:

Thanks for your reply, I just can not find the annotation file for 150bp read length. Maybe I should wait for a monent.

2016-07-18 11:07 GMT+08:00 Chris Miller notifications@github.com:

4 days is certainly unexpected. Using 8 cores and 1k windows, as you are, I'd expect it to be done in closer to 8 hours.

Perhaps something is wrong with the annotation files? Try downloading my version of the 150bp read annotations, which I just added here: https://xfer.genome.wustl.edu/gxfer1/project/cancer-genomics/copyCat/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/chrisamiller/copyCat/issues/1#issuecomment-233221753 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/ANWRw_CUOSs1ViGZktLGrdTrJenMQKbJks5qWu3XgaJpZM4JOYaB

.

Ming ZhaoShanghai Institute of Hematology & State Key Laboratory of Medical Genomics, Rui Jin Hospital, Shanghai Jiao Tong University School of Medicine (SJTU-SM)

Tel: +8613825071852 Email: zhaoming159753@gmail.com zhaoming159753@gmail.comRui Jin Hospital, Shanghai Jiao Tong University School of Medicine, Building 11, Room 1109, 197 Ruijin Er Rd, Shanghai 200025, P.R.China

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/chrisamiller/copyCat/issues/1#issuecomment-233226613 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAC8CDllOPIi1X70U6-3BV2IMV8B3IYtks5qWwADgaJpZM4JOYaB

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chrisamiller/copyCat/issues/1#issuecomment-233226690, or mute the thread https://github.com/notifications/unsubscribe-auth/ANWRw-E9kad6Urr0zZ5WLQ3BVVYpuZziks5qWwBYgaJpZM4JOYaB .

Ming ZhaoShanghai Institute of Hematology & State Key Laboratory of Medical Genomics, Rui Jin Hospital, Shanghai Jiao Tong University School of Medicine (SJTU-SM)

Tel: +8613825071852 Email: zhaoming159753@gmail.com zhaoming159753@gmail.comRui Jin Hospital, Shanghai Jiao Tong University School of Medicine, Building 11, Room 1109, 197 Ruijin Er Rd, Shanghai 200025, P.R.China