Closed xiasijian closed 1 year ago
code is below ##################################################### DoAbsolute(Seg = cnvkit_res, Maf = maf_file, platform = "Illumina_WES", copy.num.type = "total", temp.dir="./temp/", results.dir = "output_Doabsolute", nThread = 1, sigma.p = 0, max.sigma.h = 0.2, min.ploidy = 2, max.ploidy = 10, keepAllResult = FALSE, primary.disease = "Gastric Cancer", clean.temp = FALSE, max.as.seg.count = 3000, max.non.clonal = 0.05, max.neg.genome = 0.005, min.mut.af = 0.02, min.no.mut = 5, verbose = TRUE) ######################################################### RunAbsolute(seg.dat.fn = paste0(outdir,patient,"_test_output.seg"), maf.fn = paste0(outdir,patient,"_test_output.maf"), platform = "Illumina_WES", copy_num_type = "total", sigma.p=0, results.dir = "output_absolute", sample.name=patient, primary.disease= "Gastric Cancer", max.sigma.h=0.2, min.ploidy=2, max.ploidy=10, max.as.seg.count=3000, max.neg.genome=0.005, max.non.clonal=0.05, verbose = TRUE, min.mut.af=0.02)
Hi, @xiasijian
Thanks for your report about the comparison. As the DoAbsolute is just an open-source wrapper of ABSOLUTE, there are only two reasons for explaining the minor difference.
I guess you are using the same version of ABSOLUTE, so the point 2 would be the only reason.
The code related to the reason has been given as the below.
Thanks for your quick reply, but I found a problem in your example data.
当我抽取一定数量取跑的时候,测试数据在两种软件的纯度结果就有差异。如上图
然而,如果只是用原的测试数据,两者的结果是一样的。
Thanks.
I do not know why this is, can you tell me? Thanks.
然而,如果只是用原的测试数据,两者的结果是一样的。
我不是很理解,你不是说结果一致了吗? 原的测试数据又是什么意思?
王老师,是这样的,我发现当MAF文件和Seg文件的数目比较少时(如上图,我做了一定抽样),两者结果是不一样的,而我说的原测试数据,就是不进行任何抽样。
---Original--- From: "Shixiang Wang @.> Date: Fri, Feb 24, 2023 18:58 PM To: @.>; Cc: "Sijian @.**@.>; Subject: Re: [ShixiangWang/DoAbsolute] Different results between absoluted anddoabsoluted? (Issue #26)
然而,如果只是用原的测试数据,两者的结果是一样的。
我不是很理解,你不是说结果一致了吗? 原的测试数据又是什么意思?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
那我只能理解为2次抽样的数据不一样。另外 DoAbsolute 看你使用并行没有,如果是的话不一定 set.seed 能 work。其他的可能的原因我上面也已经说了。 我这边也不存在其他的可能解释。
一个最简单测试重复的办法是运行到调用 RunAbsolute 函数时,对比下这里的输入文件和你单独运行 ABSOLUTE 的输入文件是否一致。
你可以指定 cores = 1,然后运行 debug(ABSOLUTE::RunAbsolute)
,然后跑 DoAbsolute,等待跳到运行上面代码之前。
王老师,这个输入是一致的。
---Original--- From: "Shixiang Wang @.> Date: Fri, Feb 24, 2023 19:27 PM To: @.>; Cc: "Sijian @.**@.>; Subject: Re: [ShixiangWang/DoAbsolute] Different results between absoluted anddoabsoluted? (Issue #26)
那我只能理解为2次抽样的数据不一样。另外 DoAbsolute 看你使用并行没有,如果是的话不一定 set.seed 能 work。其他的可能的原因我上面也已经说了。 我这边也不存在其他的可能解释。
一个最简单测试重复的办法是运行到调用 RunAbsolute 函数时,对比下这里的输入文件和你单独运行 ABSOLUTE 的输入文件是否一致。
你可以指定 cores = 1,然后运行 debug(ABSOLUTE::RunAbsolute),然后跑 DoAbsolute,等待跳到运行上面代码之前。
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
好的,谢谢,我试下您提到的方法。
---Original--- From: "Shixiang Wang @.> Date: Fri, Feb 24, 2023 19:27 PM To: @.>; Cc: "Sijian @.**@.>; Subject: Re: [ShixiangWang/DoAbsolute] Different results between absoluted anddoabsoluted? (Issue #26)
那我只能理解为2次抽样的数据不一样。另外 DoAbsolute 看你使用并行没有,如果是的话不一定 set.seed 能 work。其他的可能的原因我上面也已经说了。 我这边也不存在其他的可能解释。
一个最简单测试重复的办法是运行到调用 RunAbsolute 函数时,对比下这里的输入文件和你单独运行 ABSOLUTE 的输入文件是否一致。
你可以指定 cores = 1,然后运行 debug(ABSOLUTE::RunAbsolute),然后跑 DoAbsolute,等待跳到运行上面代码之前。
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
王老师,这是我测试的脚本和输出,我载入了两者的最后保存的Rdata文件,确认输入的一样的,但是在model.res的结果却完全不一样,因此导致结果不一致。
王老师,还有一个问题,DoAbsolute最后的输出为NA,具体含义是?
王老师,这是我测试的脚本和输出,我载入了两者的最后保存的Rdata文件,确认输入的一样的,但是在model.res的结果却完全不一样,因此导致结果不一致。
那这就很奇怪了啊,你使用的应该是同一个absolute版本才对
那就是调用的默认参数不同,你再看看,不应该有其他解释,那太玄学了
王老师,确实很让人疑惑,我把您的脚本看了不下10遍,没发现很影响结果的代码,但当Seg文件和Maf文件的行数较少时,这两个软件就会出现偏差,这种偏差既体现在肿瘤纯度,也体现在最后得到的Rdata文件。
那就暂时不要纠结这个了,你直接用 ABSOLUTE 更好的话就直接用 for 循环调用即可,我这个包写的目的也就是为了方便一点。 下面这段代码就给后人参考吧,我会在 README 中说明一下。你后续如果发现真正的原因也可以后续再讨论。
rm(list=ls())
gc()
library(DoAbsolute)
library(dplyr)
example_path = system.file("extdata", package = "DoAbsolute", mustWork = T)
library(data.table)
library(ABSOLUTE)
##set workspace
outdir="./output/"
setwd(workdir)
# segmentation file
seg_normal = file.path(example_path, "SNP6_blood_normal.seg.txt")
seg_solid = file.path(example_path, "SNP6_solid_tumor.seg.txt")
seg_metastatic = file.path(example_path, "SNP6_metastatic_tumor.seg.txt")
# MAF file
maf_solid = file.path(example_path, "solid_tumor.maf.txt")
maf_metastatic = file.path(example_path, "metastatic_tumor.maf.txt")
# read data
seg_normal = fread(seg_normal)
seg_solid = fread(seg_solid)
seg_metastatic = fread(seg_metastatic)
maf_solid = fread(maf_solid)
maf_metastatic = fread(maf_metastatic)
# merge data
Seg = Reduce(rbind, list(seg_normal, seg_solid, seg_metastatic))
Maf = Reduce(rbind, list(maf_solid, maf_metastatic))
Seg$Sample = substr(Seg$Sample, 1, 15)
Maf$Tumor_Sample_Barcode = substr(Maf$Tumor_Sample_Barcode, 1, 15)
test_Seg=Seg %>% subset(Sample %in% c("TCGA-DK-A1A6-01"))
test_Seg=test_Seg[1:42,]
table(test_Seg$Chromosome)
test_Maf=Maf %>% subset(Tumor_Sample_Barcode %in% c("TCGA-DK-A1A6-01"))
table(test_Maf$Chromosome)
test_Maf$Chromosome=ifelse(test_Maf$Chromosome=="X",23,test_Maf$Chromosome)
test_Maf=test_Maf[1:28,]
############################ DoAbsolute #################################
# test function
DoAbsolute(Seg = test_Seg, Maf = test_Maf,
platform = "SNP_6.0",
copy.num.type = "total",
results.dir = "output_doabsolute",
nThread = 1,
sigma.p = 0,
max.sigma.h = 0.2,
min.ploidy = 0.5,
max.ploidy = 10,
keepAllResult = FALSE,
primary.disease = "Tumor",
clean.temp = FALSE,
max.as.seg.count = 1500,
max.non.clonal = 0.05,
max.neg.genome = 0.005,
min.mut.af = 0.1,
min.no.mut = 0,
verbose = TRUE)
############################ Absolute #################################
for(patient in c("TCGA-DK-A1A6-01")){
print(patient)
test_seg=test_Seg%>%subset(Sample==patient)
test_maf=test_Maf %>% subset(Tumor_Sample_Barcode==patient)
write.table(test_seg,file = paste0(outdir,patient,"_test_output.seg"),sep="\t",row.names = F,quote = F)
write.table(test_maf,file = paste0(outdir,patient,"_test_output.maf"),sep="\t",row.names = F,quote = F)
RunAbsolute(seg.dat.fn = paste0(outdir,patient,"_test_output.seg"),
maf.fn = paste0(outdir,patient,"_test_output.maf"),
platform = "SNP_6.0",
copy_num_type = "total",
sigma.p=0,
results.dir = "output_absolute",
sample.name=patient,
primary.disease="Tumor",
max.sigma.h=0.5,
min.ploidy=0.5,
max.ploidy=10,
max.as.seg.count=1600,
max.neg.genome=0.005,
max.non.clonal=0.05,
verbose = TRUE,
min.mut.af=0.1)
}
######################################
## merge absolute data
######################################
ab_outdir="./output_absolute/"
all_files=list.files(paste0(ab_outdir))
rdata_files=all_files[grepl(pattern = ".RData",all_files)]
absolute.files=paste0(ab_outdir,rdata_files)
results.dir <- file.path(ab_outdir, "abs_summary")
CreateReviewObject("DRAWS_summary", absolute.files, results.dir, "total", verbose=TRUE)
load(file = paste0(results.dir,"/","DRAWS_summary.PP-modes.data.Rdata"))
calls.path = file.path(paste0(results.dir, "/","DRAWS_summary.PP-calls_tab.txt"))
modes.path = file.path(paste0(results.dir, "/", "DRAWS_summary.PP-modes.data.RData"))
output.path = file.path(paste0(outdir, "abs_extract"))
ExtractReviewedResults(calls.path, "test", modes.path, output.path, "absolute", "total")
好的,谢谢您,王老师🙏
---Original--- From: "Shixiang Wang @.> Date: Sun, Feb 26, 2023 09:50 AM To: @.>; Cc: "Sijian @.**@.>; Subject: Re: [ShixiangWang/DoAbsolute] Different results between absoluted anddoabsoluted? (Issue #26)
那就暂时不要纠结这个了,你直接用 ABSOLUTE 更好的话就直接用 for 循环调用即可,我这个包写的目的也就是为了方便一点。 下面这段代码就给后人参考吧,我会在 README 中说明一下。你后续如果发现真正的原因也可以后续再讨论。 rm(list=ls()) gc() library(DoAbsolute) library(dplyr) example_path = system.file("extdata", package = "DoAbsolute", mustWork = T) library(data.table) library(ABSOLUTE) ##set workspace outdir="./output/" setwd(workdir) # segmentation file seg_normal = file.path(example_path, "SNP6_blood_normal.seg.txt") seg_solid = file.path(example_path, "SNP6_solid_tumor.seg.txt") seg_metastatic = file.path(example_path, "SNP6_metastatic_tumor.seg.txt") # MAF file maf_solid = file.path(example_path, "solid_tumor.maf.txt") maf_metastatic = file.path(example_path, "metastatic_tumor.maf.txt") # read data seg_normal = fread(seg_normal) seg_solid = fread(seg_solid) seg_metastatic = fread(seg_metastatic) maf_solid = fread(maf_solid) maf_metastatic = fread(maf_metastatic) # merge data Seg = Reduce(rbind, list(seg_normal, seg_solid, seg_metastatic)) Maf = Reduce(rbind, list(maf_solid, maf_metastatic)) Seg$Sample = substr(Seg$Sample, 1, 15) Maf$Tumor_Sample_Barcode = substr(Maf$Tumor_Sample_Barcode, 1, 15) test_Seg=Seg %>% subset(Sample %in% c("TCGA-DK-A1A6-01")) test_Seg=test_Seg[1:42,] table(test_Seg$Chromosome) test_Maf=Maf %>% subset(Tumor_Sample_Barcode %in% c("TCGA-DK-A1A6-01")) table(test_Maf$Chromosome) test_Maf$Chromosome=ifelse(test_Maf$Chromosome=="X",23,test_Maf$Chromosome) test_Maf=test_Maf[1:28,] ############################ DoAbsolute ################################# # test function DoAbsolute(Seg = test_Seg, Maf = test_Maf, platform = "SNP_6.0", copy.num.type = "total", results.dir = "output_doabsolute", nThread = 1, sigma.p = 0, max.sigma.h = 0.2, min.ploidy = 0.5, max.ploidy = 10, keepAllResult = FALSE, primary.disease = "Tumor", clean.temp = FALSE, max.as.seg.count = 1500, max.non.clonal = 0.05, max.neg.genome = 0.005, min.mut.af = 0.1, min.no.mut = 0, verbose = TRUE) ############################ Absolute ################################# for(patient in c("TCGA-DK-A1A6-01")){ print(patient) test_seg=test_Seg%>%subset(Sample==patient) test_maf=test_Maf %>% subset(Tumor_Sample_Barcode==patient) write.table(test_seg,file = paste0(outdir,patient,"_test_output.seg"),sep="\t",row.names = F,quote = F) write.table(test_maf,file = paste0(outdir,patient,"_test_output.maf"),sep="\t",row.names = F,quote = F) RunAbsolute(seg.dat.fn = paste0(outdir,patient,"_test_output.seg"), maf.fn = paste0(outdir,patient,"_test_output.maf"), platform = "SNP_6.0", copy_num_type = "total", sigma.p=0, results.dir = "output_absolute", sample.name=patient, primary.disease="Tumor", max.sigma.h=0.5, min.ploidy=0.5, max.ploidy=10, max.as.seg.count=1600, max.neg.genome=0.005, max.non.clonal=0.05, verbose = TRUE, min.mut.af=0.1) } ###################################### ## merge absolute data ###################################### ab_outdir="./output_absolute/" all_files=list.files(paste0(ab_outdir)) rdata_files=all_files[grepl(pattern = ".RData",all_files)] absolute.files=paste0(ab_outdir,rdata_files) results.dir <- file.path(ab_outdir, "abs_summary") CreateReviewObject("DRAWS_summary", absolute.files, results.dir, "total", verbose=TRUE) load(file = paste0(results.dir,"/","DRAWS_summary.PP-modes.data.Rdata")) calls.path = file.path(paste0(results.dir, "/","DRAWS_summary.PP-calls_tab.txt")) modes.path = file.path(paste0(results.dir, "/", "DRAWS_summary.PP-modes.data.RData")) output.path = file.path(paste0(outdir, "abs_extract")) ExtractReviewedResults(calls.path, "test", modes.path, output.path, "absolute", "total")
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
The question is below