AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
99 stars 66 forks source link

V22 epn subtyping (5/N) #1437

Closed jharenza closed 2 years ago

jharenza commented 2 years ago

Purpose/implementation Section

What scientific question is your analysis addressing?

Run EPN subtyping

What was your approach?

What GitHub issue does your pull request address?

1207

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

NA

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jaclyn-taroni commented 2 years ago

Similar question to other PRs - are changes we're seeing in the following files expected? Are the due to changes in the base file?

analyses/molecular-subtyping-EPN/results/EPN_all_data.tsv
analyses/molecular-subtyping-EPN/results/EPN_all_data_withsubgroup.tsv
jharenza commented 2 years ago

No updates here upon rerun with new base, so column order does not matter.

jharenza commented 2 years ago

Similar question to other PRs - are changes we're seeing in the following files expected? Are the due to changes in the base file?

analyses/molecular-subtyping-EPN/results/EPN_all_data.tsv
analyses/molecular-subtyping-EPN/results/EPN_all_data_withsubgroup.tsv

OK, so most of the changes come from the new focal CN file which has the NAs. The 7316-384 which went from EPN, PFA to EPN, YAP1, we also saw this in OpenPedCan. However, this should not be the case- we shouldn't have assigned a fusion-positive subtype without the fusion. This should also be the case for RELA fusions. I submitted an issue to update this #1438

jharenza commented 2 years ago

This is now ready for review with the changes in notebook 03 made my @ewafula.

jharenza commented 2 years ago

I can not figure out what the relevant change is to the 03 notebook because Jupyter notebooks sure do not play nicely with git, so I'll have to take your word for it (and you don't need me to explicitly approve this since it's not going into master)

Don't we want these changes to go into master, on the off chance we would ever have to rerun any of this?

jaclyn-taroni commented 2 years ago

You’re merging into v22-cranio here; you don’t need my approval to do that, that branch isn’t protected. You will need my approval once we get to the “bottom” of the stack and everything goes into master

jharenza commented 2 years ago

That's what I meant- that all of these PRs would make it to master. Was worried that we would lose the EPN update, but thanks for clarifying!

jaclyn-taroni commented 2 years ago

To that end, if you can post the line number(s) or a permalink to the line(s) where the substantive change was made, that'd be super helpful!

jharenza commented 2 years ago

@ewafula can you take care of this please?

ewafula commented 2 years ago

@jaclyn-taroni, the substantial change to the notebook was mainly exclusion of code that considers other molecular markers and calls tumor YAP1 positive without the YAP1 fusion as described in ticket #1438 by @jharenza and also discussed here. To that end:

1). I excluded the following function and calls to the function that assigns overexpression of CXorf67 and TKTL1 along with 1q gain to the PT_EPN_A subgroup and overexpression GPBP1 and IFT46 along with 6p and 6q loss to the PT_EPN_B subgroup:

def prioritizing_PT_EPN(row, sample_list):
    if( row["CXorf67_expr_zscore"]>3 or
        (row["CXorf67_expr_zscore"]>3 and row["1q_gain"]>0) or
        (row["TKTL1_expr_zscore"]>3  and row["1q_gain"]>0)):
        sample_list.append(row["sample_id"])
        return("EPN, PF A")
    elif((row["GPBP1_expr_zscore"]>3 and row["6q_loss"]>0) or
          (row["GPBP1_expr_zscore"]>3 and row["6p_loss"]>0) or 
          (row["IFT46_expr_zscore"]>3 and row["6q_loss"]>0) or
          (row["IFT46_expr_zscore"]>3 and row["6p_loss"]>0)):
        sample_list.append(row["sample_id"])
        return("EPN, PF B")

    else:
        return(row["subgroup"])

2). I also excluded the following code that uses a combination of markers, not including RELA and YAP2 fusions, with thresholds higher than the set tuple values to assign EPN, ST RELA and EPN, ST YAP1 subgroups :

For assigning EPN, ST RELA subgroup

st_epn_rela_tests = [("PTEN--TAS2R1",  0),
                     ("9p_loss", 0),
                     ("9q_loss", 0),
                     ("RELA_expr_zscore", 3),
                     ("L1CAM_expr_zscore",3)]
# Calling function subgroup_func to  set  the values for last column "subgroup"
EPN_final["subgroup"] = EPN_final.apply(subgroup_func,
                                        axis=1,
                                        subgroupname="EPN, ST RELA",
                                        column_values=st_epn_rela_tests,
                                        sample_list=samples_assigned)

For assigning EPN, ST YAP1 subgroup

st_epn_yap1_tests = [("C11orf95--MAML2", 0),
             ("11q_loss",  0),
             ("11q_gain", 0),
             ("ARL4D_expr_zscore", 3), 
             ("CLDN1_expr_zscore", 3)]  

EPN_final["subgroup"] = EPN_final.apply(subgroup_func,
                                        axis=1,
                                        subgroupname="EPN, ST YAP1",
                                        column_values=st_epn_yap1_tests,
                                        sample_list=samples_assigned)

The final results table including subgroups is aligns with the subtyping table produced by @komalsrathi for the same module in the OpenPedCan repo: https://github.com/PediatricOpenTargets/OpenPedCan-analysis/blob/mol-subtype-update/analyses/molecular-subtyping-EPN/results/EPN_all_data_withsubgroup.tsv