Closed FedericaBrando closed 10 months ago
In intOGen we use CBaSE dataset v1.1 and CBaSE code v1.0.
Roadmap:
between CBaSE v1.0 and CBaSE v1.1 the main different (without our tuning of the code) is the following:
neg_ln_L
function a genes_by_sobs list of lists is added. genes_by_sobs = [[ka, len(list(gr))] for ka, gr in it.groupby(sorted(
genes, key=lambda arg: int(arg["obs"][2])), key=lambda arg: int(arg["obs"][2]))]
summe = 0.
if modC == 1:
- for gind in range(len(genes)):
+ for sval in genes_by_sobs:
s = sval[0]
[...]
# *************** lambda ~ Gamma:
def pofs(s, L):
- return (L * b) ** s * (1 + L * b) ** (-s - a) * math.gamma(s + a) / (math.gamma(s + 1) * math.gamma(a))
+ return np.exp(s * np.log(L * b) + (-s - a) * np.log(1 + L * b) + sp.gammaln(s + a) - sp.gammaln(s + 1) - sp.gammaln(a))
ask Ferran about these modifications and the differences between the fine tuning as well
Cohort tested so far -->
all of them results don't show any weird behaviour.
Cohort tested so far -->
next steps:
waiting for permission (group bbg_beataml
) to run 33k samples, for analysis
Reminder sent to IT... The first email is from 11th/Oct...
run completed
Completed at: 13-Nov-2023 04:23:39
Duration : 6d 13h 5m 13s
CPU hours : 25'009.1 (0% failed)
Succeeded : 8'005
Ignored : 6
Failed : 6
Stefano reported an error concerning Cbase:
In certain cohorts is detecting all genes as significant with a unique very low q-value.
On this topic, CBase version that we use in IntOGen and the one currently on their website differ http://genetics.bwh.harvard.edu/cbase/downloads.html) . We use v.1.0 while on their web they use v.1.1 . I am still looking for some sort of release notes, to understand what they have changed.
according to ferran: Let's see what is in v1.1, because the modifications that we implemented were a direct piece of advise by Donate Weghorn -- the method's author -- hereself and chances are that they have incorporated this type of heuristics
list of cohorts that report the problem: