What scientific question is your analysis addressing?
rerun HGG subtyping
Is there anything that you want to discuss further?
Removed from HGG_defining_lesions.tsv because path_dx == SEGA (this is reflected in the v21 hist file, with cancer group == SEGA). Interestingly, these samples were not in HGG_molecular_subtype.tsv, so they are not showing as removed.
7316-2578
7316-2171
7316-3019
HGG_molecular_subtype.tsv
PT_8P368R5B 7316-4998 being removed, but is DIPG and has a subtype in v21
Brainstem glioma- Diffuse intrinsic pontine glioma is in exact dx strings, but is RNA only sample
PT_8P368R5B is in the list of pts in molecular-subtyping-pathology, but had no report, so no subtype was created there
BS_HE0WJRW6 of 7316-1455 was removed, BS_HWGWYCY7 retained
Both RNA-Seq, both in v21 subtypes as to be classified
I found it pretty hard to spot the diffs here, so I propose updating the code to add an arrange(Kids_First_Biospecimen_ID_DNA) to the end of the code, but will do after you take a look.
It appears these samples were removed with this PR, again really hard to spot without an arrangement before file output.
But the plot thickens... these samples are all in pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds, which is the input for the classifier. However, they are not in pbta-gene-expression-rsem-fpkm-collapsed.stranded_classifier_scores.tsv, the results of the classifier.
However, the files in OpenPBTA-analysis/analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds are these exact 7 samples short. Even though there is an ifelse to run using the /data folder when not run for subtyping, it seems that although I am reading the logic as OK, the module has been using the results file from collapse-rnaseq.
My understanding of #1389 is that we should close this PR, get all the code changes we know are required in, and then take another pass at rerunning it. So, I am going to close this.
Purpose/implementation Section
What scientific question is your analysis addressing?
rerun HGG subtyping
Is there anything that you want to discuss further?
Removed from
HGG_defining_lesions.tsv
because path_dx == SEGA (this is reflected in the v21 hist file, with cancer group == SEGA). Interestingly, these samples were not inHGG_molecular_subtype.tsv
, so they are not showing as removed. 7316-2578 7316-2171 7316-3019HGG_molecular_subtype.tsv
PT_8P368R5B 7316-4998 being removed, but is DIPG and has a subtype in v21Brainstem glioma- Diffuse intrinsic pontine glioma
is in exact dx strings, but is RNA only sample PT_8P368R5B is in the list of pts in molecular-subtyping-pathology, but had no report, so no subtype was created thereBS_HE0WJRW6 of 7316-1455 was removed, BS_HWGWYCY7 retained Both RNA-Seq, both in v21 subtypes as to be classified
Those samples have been recently removed from the TP53 results file:
I found it pretty hard to spot the diffs here, so I propose updating the code to add an
arrange(Kids_First_Biospecimen_ID_DNA)
to the end of the code, but will do after you take a look.It appears these samples were removed with this PR, again really hard to spot without an arrangement before file output.
But the plot thickens... these samples are all in
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
, which is the input for the classifier. However, they are not inpbta-gene-expression-rsem-fpkm-collapsed.stranded_classifier_scores.tsv
, the results of the classifier.However, the files in
OpenPBTA-analysis/analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
are these exact 7 samples short. Even though there is an ifelse to run using the/data
folder when not run for subtyping, it seems that although I am reading the logic as OK, the module has been using the results file fromcollapse-rnaseq
.