bbglab / intogen-plus

a framework for automatic and comprehensive knowledge extraction based on mutational data from sequenced tumor samples from patients.
https://www.intogen.org/search
Other
0 stars 1 forks source link

IntOGen Plus | Oncodrive3D v0.1 + LHS combination #12

Open FedericaBrando opened 8 months ago

FedericaBrando commented 8 months ago

Testing HARTWIG and TCGA with Oncodrive3D release 0.1 and the optimized version of the combination (with LHS)

using scratch option for Oncodrive3D

FedericaBrando commented 8 months ago

Running TCGA + HARTWIG it returns two uncompleted Combination for two cohorts:

THYM ttype

There are two cohorts in intogen 2023 for this ttype:

For the failing cohort there are 3 genes that are detected as drivers in the v2023 which are SMARCA4, KIT and FAM135B, for

Only SMARCA4 is detected in the TCGA cohort, KIT and FAM135B are unique to the HARTWIGone.

The main difference between the two runs is the following:

dict_keys(['oncodrivefml', 'oncodriveclustl', 'dndscv', 'smregions', 'cbase', 'hotmaps'])
[QC] {'smregions', 'oncodriveclustl', 'mutpanning'} discarded
Running on ['oncodrivefml', 'dndscv', 'cbase', 'hotmaps']
Optimization terminated successfully    (Exit mode 0)
            Current function value: -2.6676666909499587
            Iterations: 3
            Function evaluations: 28
            Gradient evaluations: 3
Iteration limit reached    (Exit mode 9)
            Current function value: -2.6676666909499587
            Iterations: 25
            Function evaluations: 440
            Gradient evaluations: 25
{'oncodrivefml': 0.25, 'oncodriveclustl': 0.0, 'dndscv': 0.25, 'smregions': 0.0, 'cbase': 0.25, 'hotmaps': 0.25}
[QC] {'seismic', 'oncodriveclustl', 'mutpanning', 'smregions', 'oncodrive3d'} discarded
Running on ['oncodrivefml', 'dndscv', 'cbase']
{'oncodrivefml': None, 'oncodriveclustl': None, 'dndscv': None, 'smregions': None, 'cbase': None, 'oncodrive3d': None}
Singular matrix E in LSQ subproblem    (Exit mode 5)
            Current function value: -2.1844151605103335
            Iterations: 1
            Function evaluations: 19
            Gradient evaluations: 2
Singular matrix E in LSQ subproblem    (Exit mode 5)
            Current function value: -2.1844151605103335
            Iterations: 1
            Function evaluations: 19
            Gradient evaluations: 2
{'oncodrivefml': None, 'oncodriveclustl': None, 'dndscv': None, 'smregions': None, 'cbase': None, 'oncodrive3d': None}
[QC] {'oncodriveclustl', 'seismic', 'mutpanning', 'smregions', 'oncodrive3d'} discarded
Running on ['dndscv', 'oncodrivefml', 'cbase']
Singular matrix E in LSQ subproblem    (Exit mode 5)
            Current function value: -1.6014802704758488
            Iterations: 1
            Function evaluations: 19
            Gradient evaluations: 2
Singular matrix E in LSQ subproblem    (Exit mode 5)
            Current function value: -1.6014802704758488
            Iterations: 1
            Function evaluations: 19
            Gradient evaluations: 2
{'dndscv': nan, 'oncodrivefml': nan, 'oncodriveclustl': nan, 'smregions': nan, 'cbase': nan, 'oncodrive3d': nan}

Important NOTE:

CBaSE fix could play an important role in this case because in the v2023 run it was bugged and every gene was return as a hit. In the new run this does not happen:

❯ zcat ../v2023/20230224_release2023/run/work/7c/6cc8e3ada748749d1661ff1f717310/HARTWIG_WGS_THYM_2020.cbase.tsv.gz| head
gene    p_pos   q_pos   mis_obs non_obs syn_obs
A1BG    1.428265e-02    1.428265e-02    0   0   0
A1CF    1.428265e-02    1.428265e-02    0   0   0
A2M 1.428265e-02    1.428265e-02    0   0   0
A2ML1   1.428265e-02    1.428265e-02    0   0   0
A3GALT2 1.428265e-02    1.428265e-02    0   0   0
A4GALT  1.428265e-02    1.428265e-02    0   0   0
A4GNT   1.428265e-02    1.428265e-02    0   0   0
AAAS    1.428265e-02    1.428265e-02    0   0   0
AACS    1.428265e-02    1.428265e-02    0   0   0
❯ zcat work/cb/2a93aa0134864c3b2fd129e5f62077/HARTWIG_WGS_THYM_2020.cbase.tsv.gz| head
gene    p_pos   q_pos   mis_obs non_obs syn_obs
A1BG    1.000000e+00    1.000000e+00    0   0   0
A1CF    1.000000e+00    1.000000e+00    0   0   0
A2M 1.000000e+00    1.000000e+00    0   0   0
A2ML1   1.000000e+00    1.000000e+00    0   0   0
A3GALT2 1.000000e+00    1.000000e+00    0   0   0
A4GALT  1.000000e+00    1.000000e+00    0   0   0
A4GNT   1.000000e+00    1.000000e+00    0   0   0
AAAS    1.000000e+00    1.000000e+00    0   0   0
AACS    1.000000e+00    1.000000e+00    0   0   0