Closed garyzhubc closed 1 year ago
I reread the code of the "bcaTrunc" function. The disjunction of small mass subsets is what I effectively programmed, as you can see with the small example given in the help page of the function. Seems I had forgot this detail!
I see. If this is the case then conflict is what's causing the problem. Maybe I could look into whether there's a conflict-less way of constructing beliefs?
I printed a list of iteration number and conflict so as you can see conflict start to increase after 27-th belief that's combined.
[1] 2 [1] 0 [1] 3 [1] 0 [1] 4 [1] 0 [1] 5 [1] 0 [1] 6 [1] 0 [1] 7 [1] 0 [1] 8 [1] 0 [1] 9 [1] 0 [1] 10 [1] 0 [1] 11 [1] 0 [1] 12 [1] 0 [1] 13 [1] 0 [1] 14 [1] 0 [1] 15 [1] 0 [1] 16 [1] 0 [1] 17 [1] 0 [1] 18 [1] 0 [1] 19 [1] 0 [1] 20 [1] 0 [1] 21 [1] 0 [1] 22 [1] 0 [1] 23 [1] 0 [1] 24 [1] 0 [1] 25 [1] 0 [1] 26 [1] 0 [1] 27 [1] 0.7680963 [1] 28 [1] 0.9462207 [1] 29 [1] 0.9875284 [1] 30 [1] 0.9971652 [1] 31 [1] 0.9997388 [1] 32 [1] 0.9999759 [1] 33 [1] 0.9999978 [1] 34 [1] 0.9999998 [1] 35 [1] 1
Right now I'm trying "skip combination if there's a conflict" and I'll see if it works.
The result: i = : 1 con = : 0.97929 i = : 2 con = : 0.9979104 i = : 3 con = : 0.9997908 i = : 4 con = : 0.9999791 i = : 5 con = : 0.9999979 i = : 6 con = : 0.9999998 i = : 7 con = : 1 i = : 8 con = : 1 i = : 9 con = : 1 i = : 10 con = : 1
[1] "conflict 0" [1] "436-th skip" [1] "iteration 1564" [1] "conflict 0" [1] "437-th skip" [1] "iteration 1565" [1] "conflict 0" [1] "438-th skip" Error in ztgo[, 1:ncol(x$tt)] : incorrect number of dimensions Calls: bcaTrunc Execution halted
Still a lot of problems being applied to large datasets.
Bug presumably because there was only one subset with mass less than treshold value. I have added checks to handle this case.
Seems to be a good idea to skip the contradictory cases. Maybe combine them together, then combine the two final results?
Now the code is working fine by skipping. However I still cannot get the kind of result that I'm expecting to see. Due to known research, the middle row is expected to have either higher beliefs or plausibility.
Were these three subsets added with the addTobca function? If so, that means there is no subset of the last two in your resulting bca. Is this possible?
No. I would expect the first and third to be close to nothing and only the second shows up something.Sent from my iPhoneOn Jun 11, 2023, at 6:19 PM, Claude Boivin @.***> wrote: Were these three subsets added with the addTobca function? If so, that means there is no subset of the last two in your resulting bca. Is this possible?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
Do you normalize at every step or only at the end?
Only at the end.Sent from my iPhoneOn Jun 12, 2023, at 6:45 AM, Claude Boivin @.***> wrote: Do you normalize at every step or only at the end?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
Bugs corrections 1) In the nzdsr function: I deleted the test with the measure of conflict (lines 37-38 of nzdsr. This indice does not have to intervene in the combination by Dempster's rule. This is only a decision aid in the analysis conflicting evidence.
2) In the dsrwon function: I removed the measure of conflict as a warning of conflicting evidence. Not necessary.
I made changes to the computation of the measure of conflic. . The result is now the same whether we normalize only at the end of many calls to dsrwon or normalize after each call to dsrwon.
As it is now, the code is not clean, but with these changes you should be able to process your data file without skipping records.
Good luck.
The new version doesn't give any conflict. That's a little strange.
[1] "iteration 13072" [1] "conflict 0" [1] "iteration 13073" [1] "conflict 0" [1] "iteration 13074" [1] "conflict 0" [1] "iteration 13075" [1] "conflict 0" [1] "iteration 13076" [1] "conflict 0" [1] "iteration 13077" [1] "conflict 0" [1] "iteration 13078" [1] "conflict 0" [1] "iteration 13079" [1] "conflict 0" [1] "iteration 13080" [1] "conflict 0" [1] "iteration 13081"
That's very strange. Everything has zero conflict. Is it possible? Before it has a lot of conflict increasing gradually until it reaches one.
Indeed, it is strange. I did some tests with conflicting data on a small loop. I obtain a contradiction index at each step of the loop. I suggest you retain or print the sum of the first row of the "tt" matrix at each step, in addition to the "$con" value. If the value $con is zero, the sum of the first line must NOT be zero.
Also maybe it has to do with truncation, if you use it. I'll look into this tomorrow, see if I have taken into account the case where the empty set is present with a mass less than the threshold value.
Here's how I did this:
# load cache
library(bigsnpr)
pcsk9_updown_bigsnp_obj <- snp_attach("../data/ukb22828_c1_b0_v3_pcsk9_updown.rds")
ukb_data_ldl <- readRDS("../data/ukb_data_ldl.rds")
library(dst)
# initialize loop
N <- nrow(pcsk9_updown_bigsnp_obj$genotypes)
M <- ncol(pcsk9_updown_bigsnp_obj$genotypes)
ukb_data_ldl$ldl_highlow[1:30]
# check index
# if high, use <= 1
#a <- 1e-5
a <- 0.9
trunc_thres_upper <- 1
trunc_thres_lower <- 0.1
bpa <- bca(rbind(as.integer(pcsk9_updown_bigsnp_obj$genotypes[ukb_data_ldl[1, 4], ] <= 2-trunc_thres_upper), rep(1, M)), c(a, 1-a), pcsk9_updown_bigsnp_obj$map$rsid)
# rowwise Dempster's rule of combination
K <- 30
nskip <- 0
#K <- nrow(ukb_data_ldl)
for (i in 2:K) {
print(paste("iteration",i))
if (ukb_data_ldl$ldl_highlow[i] == FALSE) {
bpa_new <- bca(rbind(as.integer(pcsk9_updown_bigsnp_obj$genotypes[ukb_data_ldl[i, 4], ] >= trunc_thres_lower), rep(1, M)), c(a, 1-a), pcsk9_updown_bigsnp_obj$map$rsid)
bpa_new <- dsrwon(bpa, bpa_new, mcores = "yes")
} else if (ukb_data_ldl$ldl_highlow[i] == TRUE) {
bpa_new <- bca(rbind(as.integer(pcsk9_updown_bigsnp_obj$genotypes[ukb_data_ldl[i, 4], ] <= 2-trunc_thres_upper), rep(1, M)), c(a, 1-a), pcsk9_updown_bigsnp_obj$map$rsid)
bpa_new <- dsrwon(bpa, bpa_new, mcores = "yes")
}
print(paste("sum(bpa$tt)",sum(bpa$tt[1,])))
print(paste("conflict",bpa$con))
#if(bpa_new$con == 0) {
# bpa <- bpa_new
#} else {
# nskip <- nskip + 1
# print(paste0(nskip,"-th skip"))
#}
if(i %% 5 == 0) {
bpa <- bcaTrunc(bpa, 0.001)
}
saveRDS(bpa, "../data/bpa.rds")
}
[1] "sum(bpa$tt) 172"
[1] "conflict 0"
[1] "iteration 635"
[1] "sum(bpa$tt) 172"
[1] "conflict 0"
[1] "iteration 636"
[1] "sum(bpa$tt) 172"
[1] "conflict 0"
[1] "iteration 637"
[1] "sum(bpa$tt) 172"
[1] "conflict 0"
[1] "iteration 638"
[1] "sum(bpa$tt) 172"
[1] "conflict 0"
[1] "iteration 639"
[1] "sum(bpa$tt) 172"
[1] "conflict 0"
[1] "iteration 640"
[1] "sum(bpa$tt) 172"
[1] "conflict 0"
[1] "iteration 641"
[1] "sum(bpa$tt) 172"
[1] "conflict 0"
[1] "iteration 642"
So not in the way you suggested.
And yes, I use truncation.
If I don't use truncation, the combination progresses interestingly much faster than before, without any conflict. I think something is wrong.
Could you do a test with mcores = "no" on a few cases, to see if we have the same problem?
I found some issues with my code, so no problem yet seen on the update. Now I'm able to run this successfully on large number of patients.
Glad to know things are starting to work better. I hope the results will meet your expectations.
Le jeu. 15 juin 2023 à 20:25, Peiyuan Zhu @.***> a écrit :
Closed #11 https://github.com/RAPLER/dst-1/issues/11 as completed.
— Reply to this email directly, view it on GitHub https://github.com/RAPLER/dst-1/issues/11#event-9545910957, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5D5IP5AAHFXGZJE4EQMTDXLORXPANCNFSM6AAAAAAYY7A35Y . You are receiving this because you commented.Message ID: @.***>
Better than truncation: merge subsets with small bca into one bigger subset.
Reason: conflict occur quickly without normalizing after every iteration. Not sure if it's because of numerical instability or actual conflict. Need much more testing than it is already been done.