AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
101 stars 67 forks source link

fix DNA-RNA mapping using parent_aliquot_id #1096

Closed komalsrathi closed 3 years ago

komalsrathi commented 3 years ago

Purpose/implementation Section

What scientific question is your analysis addressing?

Multiple MB subtypes assigned to BS_5BMNK8SY

What was your approach?

Use parent_aliquot_id to map DNA to RNA. This will map BS_5BMNK8SY to just BS_QEYPFYHD. The other RNA BS_Z7PKVY9J will be mapped to NA.

What GitHub issue does your pull request address?

https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/1085

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Output tsv files

Is there anything that you want to discuss further?

N/A

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

komalsrathi commented 3 years ago

@kgaonkar6 can you check if this makes sense now? This should not affect any results but only the mapping of DNA samples to RNA samples in the output tsv file.

komalsrathi commented 3 years ago

For obtaining the consensus subtype, we can have the 3/4 cutoff for all duplicated samples (not sure if there are others too) or do you want me to hardcode for this sample only? Also, I think I will have to revert back to what we had before i.e. only use KF Participant ID and sample_id to map DNA and RNA because if I use the parental aliquot ID, BS_Z7PKVY9J (RNA) sample won't get mapped to BS_5BMNK8SY (DNA) and then we cannot apply the 3/4 rule.

Is that all or am I missing anything?

jharenza commented 3 years ago

For obtaining the consensus subtype, we can have the 3/4 cutoff for all duplicated samples (not sure if there are others too) or do you want me to hardcode for this sample only? Also, I think I will have to revert back to what we had before i.e. only use KF Participant ID and sample_id to map DNA and RNA because if I use the parental aliquot ID, BS_Z7PKVY9J (RNA) sample won't get mapped to BS_5BMNK8SY (DNA) and then we cannot apply the 3/4 rule.

Is that all or am I missing anything?

That's right - for 3/4, it would only be where samples for the same patient have the same sample_id and thus same tumor_descriptor. In addition, it sounds like the parental_aliquot_id will be changing in future releases, so using that will be unstable.

komalsrathi commented 3 years ago

@kgaonkar6 okay, I think this should fix the issue. There was a small bug that was overriding the propagation of subtypes which has been discussed before.

kgaonkar6 commented 3 years ago

Perfect! This is ready to merge

jaclyn-taroni commented 3 years ago

We need a re-review from @jharenza because there are changes requested (or for me to dismiss the previous review) before this can be merged.