Open FedericaBrando opened 8 months ago
Running TCGA + HARTWIG it returns two uncompleted Combination for two cohorts:
HARTWIG_WGS_MCC_2020
Expected v2023 results
HARTWIG_WGS_THYM_2020
Unexpected v2023 results
There are two cohorts in intogen 2023 for this ttype:
HARTWIG_WGS_THYM_2020
- samples 11TCGA_WXS_THYM
- samples 122For the failing cohort there are 3 genes that are detected as drivers in the v2023 which are SMARCA4, KIT and FAM135B, for
Only SMARCA4 is detected in the TCGA cohort, KIT and FAM135B are unique to the HARTWIG
one.
The main difference between the two runs is the following:
dict_keys(['oncodrivefml', 'oncodriveclustl', 'dndscv', 'smregions', 'cbase', 'hotmaps'])
[QC] {'smregions', 'oncodriveclustl', 'mutpanning'} discarded
Running on ['oncodrivefml', 'dndscv', 'cbase', 'hotmaps']
Optimization terminated successfully (Exit mode 0)
Current function value: -2.6676666909499587
Iterations: 3
Function evaluations: 28
Gradient evaluations: 3
Iteration limit reached (Exit mode 9)
Current function value: -2.6676666909499587
Iterations: 25
Function evaluations: 440
Gradient evaluations: 25
{'oncodrivefml': 0.25, 'oncodriveclustl': 0.0, 'dndscv': 0.25, 'smregions': 0.0, 'cbase': 0.25, 'hotmaps': 0.25}
[QC] {'seismic', 'oncodriveclustl', 'mutpanning', 'smregions', 'oncodrive3d'} discarded
Running on ['oncodrivefml', 'dndscv', 'cbase']
{'oncodrivefml': None, 'oncodriveclustl': None, 'dndscv': None, 'smregions': None, 'cbase': None, 'oncodrive3d': None}
Singular matrix E in LSQ subproblem (Exit mode 5)
Current function value: -2.1844151605103335
Iterations: 1
Function evaluations: 19
Gradient evaluations: 2
Singular matrix E in LSQ subproblem (Exit mode 5)
Current function value: -2.1844151605103335
Iterations: 1
Function evaluations: 19
Gradient evaluations: 2
{'oncodrivefml': None, 'oncodriveclustl': None, 'dndscv': None, 'smregions': None, 'cbase': None, 'oncodrive3d': None}
[QC] {'oncodriveclustl', 'seismic', 'mutpanning', 'smregions', 'oncodrive3d'} discarded
Running on ['dndscv', 'oncodrivefml', 'cbase']
Singular matrix E in LSQ subproblem (Exit mode 5)
Current function value: -1.6014802704758488
Iterations: 1
Function evaluations: 19
Gradient evaluations: 2
Singular matrix E in LSQ subproblem (Exit mode 5)
Current function value: -1.6014802704758488
Iterations: 1
Function evaluations: 19
Gradient evaluations: 2
{'dndscv': nan, 'oncodrivefml': nan, 'oncodriveclustl': nan, 'smregions': nan, 'cbase': nan, 'oncodrive3d': nan}
CBaSE fix could play an important role in this case because in the v2023 run it was bugged and every gene was return as a hit. In the new run this does not happen:
❯ zcat ../v2023/20230224_release2023/run/work/7c/6cc8e3ada748749d1661ff1f717310/HARTWIG_WGS_THYM_2020.cbase.tsv.gz| head
gene p_pos q_pos mis_obs non_obs syn_obs
A1BG 1.428265e-02 1.428265e-02 0 0 0
A1CF 1.428265e-02 1.428265e-02 0 0 0
A2M 1.428265e-02 1.428265e-02 0 0 0
A2ML1 1.428265e-02 1.428265e-02 0 0 0
A3GALT2 1.428265e-02 1.428265e-02 0 0 0
A4GALT 1.428265e-02 1.428265e-02 0 0 0
A4GNT 1.428265e-02 1.428265e-02 0 0 0
AAAS 1.428265e-02 1.428265e-02 0 0 0
AACS 1.428265e-02 1.428265e-02 0 0 0
❯ zcat work/cb/2a93aa0134864c3b2fd129e5f62077/HARTWIG_WGS_THYM_2020.cbase.tsv.gz| head
gene p_pos q_pos mis_obs non_obs syn_obs
A1BG 1.000000e+00 1.000000e+00 0 0 0
A1CF 1.000000e+00 1.000000e+00 0 0 0
A2M 1.000000e+00 1.000000e+00 0 0 0
A2ML1 1.000000e+00 1.000000e+00 0 0 0
A3GALT2 1.000000e+00 1.000000e+00 0 0 0
A4GALT 1.000000e+00 1.000000e+00 0 0 0
A4GNT 1.000000e+00 1.000000e+00 0 0 0
AAAS 1.000000e+00 1.000000e+00 0 0 0
AACS 1.000000e+00 1.000000e+00 0 0 0
Testing
HARTWIG
andTCGA
with Oncodrive3D release 0.1 and the optimized version of the combination (with LHS)using scratch option for Oncodrive3D