ShixiangWang / DoAbsolute

:package: Automate Absolute Copy Number Calling using 'ABSOLUTE' package
Other
36 stars 11 forks source link

Different results between absoluted and doabsoluted? #26

Closed xiasijian closed 1 year ago

xiasijian commented 1 year ago

The question is below

xiasijian commented 1 year ago

image

code is below ##################################################### DoAbsolute(Seg = cnvkit_res, Maf = maf_file, platform = "Illumina_WES", copy.num.type = "total", temp.dir="./temp/", results.dir = "output_Doabsolute", nThread = 1, sigma.p = 0, max.sigma.h = 0.2, min.ploidy = 2, max.ploidy = 10, keepAllResult = FALSE, primary.disease = "Gastric Cancer", clean.temp = FALSE, max.as.seg.count = 3000, max.non.clonal = 0.05, max.neg.genome = 0.005, min.mut.af = 0.02, min.no.mut = 5, verbose = TRUE) ######################################################### RunAbsolute(seg.dat.fn = paste0(outdir,patient,"_test_output.seg"), maf.fn = paste0(outdir,patient,"_test_output.maf"), platform = "Illumina_WES", copy_num_type = "total", sigma.p=0, results.dir = "output_absolute", sample.name=patient, primary.disease= "Gastric Cancer", max.sigma.h=0.2, min.ploidy=2, max.ploidy=10, max.as.seg.count=3000, max.neg.genome=0.005, max.non.clonal=0.05, verbose = TRUE, min.mut.af=0.02)

xiasijian commented 1 year ago

image

ShixiangWang commented 1 year ago

Hi, @xiasijian

Thanks for your report about the comparison. As the DoAbsolute is just an open-source wrapper of ABSOLUTE, there are only two reasons for explaining the minor difference.

  1. The used versions of ABSOLUTE are different.
  2. Preprocessing did or default options used by the DoAbsolute.

I guess you are using the same version of ABSOLUTE, so the point 2 would be the only reason.

The code related to the reason has been given as the below.

https://github.com/ShixiangWang/DoAbsolute/blob/8748cfb39eca5059f539a57829c49c83a869a31a/R/DoAbsolute.R#L18-L55

https://github.com/ShixiangWang/DoAbsolute/blob/8748cfb39eca5059f539a57829c49c83a869a31a/R/DoAbsolute.R#L136-L141

https://github.com/ShixiangWang/DoAbsolute/blob/8748cfb39eca5059f539a57829c49c83a869a31a/R/DoAbsolute.R#L203

xiasijian commented 1 year ago

Thanks for your quick reply, but I found a problem in your example data.

xiasijian commented 1 year ago

image

image

当我抽取一定数量取跑的时候,测试数据在两种软件的纯度结果就有差异。如上图

xiasijian commented 1 year ago

然而,如果只是用原的测试数据,两者的结果是一样的。

ShixiangWang commented 1 year ago

Thanks.

xiasijian commented 1 year ago

I do not know why this is, can you tell me? Thanks.

ShixiangWang commented 1 year ago

然而,如果只是用原的测试数据,两者的结果是一样的。

我不是很理解,你不是说结果一致了吗? 原的测试数据又是什么意思?

xiasijian commented 1 year ago

王老师,是这样的,我发现当MAF文件和Seg文件的数目比较少时(如上图,我做了一定抽样),两者结果是不一样的,而我说的原测试数据,就是不进行任何抽样。

---Original--- From: "Shixiang Wang @.> Date: Fri, Feb 24, 2023 18:58 PM To: @.>; Cc: "Sijian @.**@.>; Subject: Re: [ShixiangWang/DoAbsolute] Different results between absoluted anddoabsoluted? (Issue #26)

然而,如果只是用原的测试数据,两者的结果是一样的。

我不是很理解,你不是说结果一致了吗? 原的测试数据又是什么意思?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

ShixiangWang commented 1 year ago

那我只能理解为2次抽样的数据不一样。另外 DoAbsolute 看你使用并行没有,如果是的话不一定 set.seed 能 work。其他的可能的原因我上面也已经说了。 我这边也不存在其他的可能解释。

一个最简单测试重复的办法是运行到调用 RunAbsolute 函数时,对比下这里的输入文件和你单独运行 ABSOLUTE 的输入文件是否一致。

https://github.com/ShixiangWang/DoAbsolute/blob/8748cfb39eca5059f539a57829c49c83a869a31a/R/DoAbsolute.R#L285-L295

你可以指定 cores = 1,然后运行 debug(ABSOLUTE::RunAbsolute),然后跑 DoAbsolute,等待跳到运行上面代码之前。

xiasijian commented 1 year ago

王老师,这个输入是一致的。

---Original--- From: "Shixiang Wang @.> Date: Fri, Feb 24, 2023 19:27 PM To: @.>; Cc: "Sijian @.**@.>; Subject: Re: [ShixiangWang/DoAbsolute] Different results between absoluted anddoabsoluted? (Issue #26)

那我只能理解为2次抽样的数据不一样。另外 DoAbsolute 看你使用并行没有,如果是的话不一定 set.seed 能 work。其他的可能的原因我上面也已经说了。 我这边也不存在其他的可能解释。

一个最简单测试重复的办法是运行到调用 RunAbsolute 函数时,对比下这里的输入文件和你单独运行 ABSOLUTE 的输入文件是否一致。

https://github.com/ShixiangWang/DoAbsolute/blob/8748cfb39eca5059f539a57829c49c83a869a31a/R/DoAbsolute.R#L285-L295

你可以指定 cores = 1,然后运行 debug(ABSOLUTE::RunAbsolute),然后跑 DoAbsolute,等待跳到运行上面代码之前。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

xiasijian commented 1 year ago

好的,谢谢,我试下您提到的方法。

---Original--- From: "Shixiang Wang @.> Date: Fri, Feb 24, 2023 19:27 PM To: @.>; Cc: "Sijian @.**@.>; Subject: Re: [ShixiangWang/DoAbsolute] Different results between absoluted anddoabsoluted? (Issue #26)

那我只能理解为2次抽样的数据不一样。另外 DoAbsolute 看你使用并行没有,如果是的话不一定 set.seed 能 work。其他的可能的原因我上面也已经说了。 我这边也不存在其他的可能解释。

一个最简单测试重复的办法是运行到调用 RunAbsolute 函数时,对比下这里的输入文件和你单独运行 ABSOLUTE 的输入文件是否一致。

https://github.com/ShixiangWang/DoAbsolute/blob/8748cfb39eca5059f539a57829c49c83a869a31a/R/DoAbsolute.R#L285-L295

你可以指定 cores = 1,然后运行 debug(ABSOLUTE::RunAbsolute),然后跑 DoAbsolute,等待跳到运行上面代码之前。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

xiasijian commented 1 year ago

01_Doabsolute_min.zip

王老师,这是我测试的脚本和输出,我载入了两者的最后保存的Rdata文件,确认输入的一样的,但是在model.res的结果却完全不一样,因此导致结果不一致。

xiasijian commented 1 year ago

王老师,还有一个问题,DoAbsolute最后的输出为NA,具体含义是?

ShixiangWang commented 1 year ago

01_Doabsolute_min.zip

王老师,这是我测试的脚本和输出,我载入了两者的最后保存的Rdata文件,确认输入的一样的,但是在model.res的结果却完全不一样,因此导致结果不一致。

那这就很奇怪了啊,你使用的应该是同一个absolute版本才对

ShixiangWang commented 1 year ago

那就是调用的默认参数不同,你再看看,不应该有其他解释,那太玄学了

xiasijian commented 1 year ago

王老师,确实很让人疑惑,我把您的脚本看了不下10遍,没发现很影响结果的代码,但当Seg文件和Maf文件的行数较少时,这两个软件就会出现偏差,这种偏差既体现在肿瘤纯度,也体现在最后得到的Rdata文件。

ShixiangWang commented 1 year ago

那就暂时不要纠结这个了,你直接用 ABSOLUTE 更好的话就直接用 for 循环调用即可,我这个包写的目的也就是为了方便一点。 下面这段代码就给后人参考吧,我会在 README 中说明一下。你后续如果发现真正的原因也可以后续再讨论。

rm(list=ls())
gc()

library(DoAbsolute)
library(dplyr)
example_path = system.file("extdata", package = "DoAbsolute", mustWork = T)
library(data.table)
library(ABSOLUTE)

##set workspace
outdir="./output/"
setwd(workdir)

# segmentation file
seg_normal =  file.path(example_path, "SNP6_blood_normal.seg.txt")
seg_solid  =  file.path(example_path, "SNP6_solid_tumor.seg.txt")
seg_metastatic  = file.path(example_path, "SNP6_metastatic_tumor.seg.txt")

# MAF file
maf_solid  = file.path(example_path, "solid_tumor.maf.txt")
maf_metastatic  = file.path(example_path, "metastatic_tumor.maf.txt")

# read data
seg_normal = fread(seg_normal)
seg_solid = fread(seg_solid)
seg_metastatic = fread(seg_metastatic)
maf_solid = fread(maf_solid)
maf_metastatic = fread(maf_metastatic)

# merge data
Seg = Reduce(rbind, list(seg_normal, seg_solid, seg_metastatic))
Maf = Reduce(rbind, list(maf_solid, maf_metastatic))

Seg$Sample = substr(Seg$Sample, 1, 15)
Maf$Tumor_Sample_Barcode = substr(Maf$Tumor_Sample_Barcode, 1, 15)

test_Seg=Seg %>% subset(Sample %in% c("TCGA-DK-A1A6-01"))
test_Seg=test_Seg[1:42,]
table(test_Seg$Chromosome)
test_Maf=Maf %>% subset(Tumor_Sample_Barcode %in% c("TCGA-DK-A1A6-01"))
table(test_Maf$Chromosome)
test_Maf$Chromosome=ifelse(test_Maf$Chromosome=="X",23,test_Maf$Chromosome)
test_Maf=test_Maf[1:28,]

############################ DoAbsolute #################################
# test function
DoAbsolute(Seg = test_Seg, Maf = test_Maf, 
           platform = "SNP_6.0", 
           copy.num.type = "total",
           results.dir = "output_doabsolute",
           nThread = 1, 
           sigma.p = 0,
           max.sigma.h = 0.2,
           min.ploidy = 0.5,
           max.ploidy = 10,
           keepAllResult = FALSE, 
           primary.disease = "Tumor",
           clean.temp = FALSE,
           max.as.seg.count = 1500,
           max.non.clonal = 0.05,
           max.neg.genome = 0.005,
           min.mut.af = 0.1,
           min.no.mut = 0,
           verbose = TRUE)

############################ Absolute #################################
for(patient in c("TCGA-DK-A1A6-01")){
  print(patient)
  test_seg=test_Seg%>%subset(Sample==patient)
  test_maf=test_Maf %>% subset(Tumor_Sample_Barcode==patient)
  write.table(test_seg,file = paste0(outdir,patient,"_test_output.seg"),sep="\t",row.names = F,quote = F)
  write.table(test_maf,file = paste0(outdir,patient,"_test_output.maf"),sep="\t",row.names = F,quote = F)
  RunAbsolute(seg.dat.fn = paste0(outdir,patient,"_test_output.seg"),
              maf.fn = paste0(outdir,patient,"_test_output.maf"), 
              platform = "SNP_6.0",
              copy_num_type = "total",
              sigma.p=0, 
              results.dir = "output_absolute",
              sample.name=patient,
              primary.disease="Tumor",
              max.sigma.h=0.5,
              min.ploidy=0.5,
              max.ploidy=10,
              max.as.seg.count=1600,
              max.neg.genome=0.005, 
              max.non.clonal=0.05,
              verbose = TRUE, 
              min.mut.af=0.1)

}

######################################
## merge absolute data
######################################
ab_outdir="./output_absolute/"
all_files=list.files(paste0(ab_outdir))
rdata_files=all_files[grepl(pattern = ".RData",all_files)]
absolute.files=paste0(ab_outdir,rdata_files)
results.dir <- file.path(ab_outdir, "abs_summary")
CreateReviewObject("DRAWS_summary", absolute.files, results.dir, "total", verbose=TRUE)

load(file = paste0(results.dir,"/","DRAWS_summary.PP-modes.data.Rdata"))
calls.path = file.path(paste0(results.dir, "/","DRAWS_summary.PP-calls_tab.txt"))
modes.path = file.path(paste0(results.dir, "/", "DRAWS_summary.PP-modes.data.RData"))
output.path = file.path(paste0(outdir, "abs_extract"))
ExtractReviewedResults(calls.path, "test", modes.path, output.path, "absolute", "total")
xiasijian commented 1 year ago

好的,谢谢您,王老师🙏

---Original--- From: "Shixiang Wang @.> Date: Sun, Feb 26, 2023 09:50 AM To: @.>; Cc: "Sijian @.**@.>; Subject: Re: [ShixiangWang/DoAbsolute] Different results between absoluted anddoabsoluted? (Issue #26)

那就暂时不要纠结这个了,你直接用 ABSOLUTE 更好的话就直接用 for 循环调用即可,我这个包写的目的也就是为了方便一点。 下面这段代码就给后人参考吧,我会在 README 中说明一下。你后续如果发现真正的原因也可以后续再讨论。 rm(list=ls()) gc() library(DoAbsolute) library(dplyr) example_path = system.file("extdata", package = "DoAbsolute", mustWork = T) library(data.table) library(ABSOLUTE) ##set workspace outdir="./output/" setwd(workdir) # segmentation file seg_normal = file.path(example_path, "SNP6_blood_normal.seg.txt") seg_solid = file.path(example_path, "SNP6_solid_tumor.seg.txt") seg_metastatic = file.path(example_path, "SNP6_metastatic_tumor.seg.txt") # MAF file maf_solid = file.path(example_path, "solid_tumor.maf.txt") maf_metastatic = file.path(example_path, "metastatic_tumor.maf.txt") # read data seg_normal = fread(seg_normal) seg_solid = fread(seg_solid) seg_metastatic = fread(seg_metastatic) maf_solid = fread(maf_solid) maf_metastatic = fread(maf_metastatic) # merge data Seg = Reduce(rbind, list(seg_normal, seg_solid, seg_metastatic)) Maf = Reduce(rbind, list(maf_solid, maf_metastatic)) Seg$Sample = substr(Seg$Sample, 1, 15) Maf$Tumor_Sample_Barcode = substr(Maf$Tumor_Sample_Barcode, 1, 15) test_Seg=Seg %>% subset(Sample %in% c("TCGA-DK-A1A6-01")) test_Seg=test_Seg[1:42,] table(test_Seg$Chromosome) test_Maf=Maf %>% subset(Tumor_Sample_Barcode %in% c("TCGA-DK-A1A6-01")) table(test_Maf$Chromosome) test_Maf$Chromosome=ifelse(test_Maf$Chromosome=="X",23,test_Maf$Chromosome) test_Maf=test_Maf[1:28,] ############################ DoAbsolute ################################# # test function DoAbsolute(Seg = test_Seg, Maf = test_Maf, platform = "SNP_6.0", copy.num.type = "total", results.dir = "output_doabsolute", nThread = 1, sigma.p = 0, max.sigma.h = 0.2, min.ploidy = 0.5, max.ploidy = 10, keepAllResult = FALSE, primary.disease = "Tumor", clean.temp = FALSE, max.as.seg.count = 1500, max.non.clonal = 0.05, max.neg.genome = 0.005, min.mut.af = 0.1, min.no.mut = 0, verbose = TRUE) ############################ Absolute ################################# for(patient in c("TCGA-DK-A1A6-01")){ print(patient) test_seg=test_Seg%>%subset(Sample==patient) test_maf=test_Maf %>% subset(Tumor_Sample_Barcode==patient) write.table(test_seg,file = paste0(outdir,patient,"_test_output.seg"),sep="\t",row.names = F,quote = F) write.table(test_maf,file = paste0(outdir,patient,"_test_output.maf"),sep="\t",row.names = F,quote = F) RunAbsolute(seg.dat.fn = paste0(outdir,patient,"_test_output.seg"), maf.fn = paste0(outdir,patient,"_test_output.maf"), platform = "SNP_6.0", copy_num_type = "total", sigma.p=0, results.dir = "output_absolute", sample.name=patient, primary.disease="Tumor", max.sigma.h=0.5, min.ploidy=0.5, max.ploidy=10, max.as.seg.count=1600, max.neg.genome=0.005, max.non.clonal=0.05, verbose = TRUE, min.mut.af=0.1) } ###################################### ## merge absolute data ###################################### ab_outdir="./output_absolute/" all_files=list.files(paste0(ab_outdir)) rdata_files=all_files[grepl(pattern = ".RData",all_files)] absolute.files=paste0(ab_outdir,rdata_files) results.dir <- file.path(ab_outdir, "abs_summary") CreateReviewObject("DRAWS_summary", absolute.files, results.dir, "total", verbose=TRUE) load(file = paste0(results.dir,"/","DRAWS_summary.PP-modes.data.Rdata")) calls.path = file.path(paste0(results.dir, "/","DRAWS_summary.PP-calls_tab.txt")) modes.path = file.path(paste0(results.dir, "/", "DRAWS_summary.PP-modes.data.RData")) output.path = file.path(paste0(outdir, "abs_extract")) ExtractReviewedResults(calls.path, "test", modes.path, output.path, "absolute", "total")

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>