d3b-center / ticket-tracker-OPC

A repo to generate and track tickets for ped OT
2 stars 0 forks source link

Master ticket for PedcBio v10 load #286

Closed runjin326 closed 2 years ago

runjin326 commented 2 years ago

What data file(s) does this issue pertain to?

PedCBio v10 dataload

What release are you using?

v10

Put your question or report your issue here.

v10 load related issues:

OpenPedCan update after v10 is loaded to pedcbio:

Additional issues:

Note that the other tickets that showed up after searching PedCBio are not directly related to the pedcbio load - either they are documentation updates, or module enhancement.

jharenza commented 2 years ago

@migbro let's also update the description as below:

Current:

[Pediatric Open Targets (V10)](https://kf-strides-cbioportal-qa.kidsfirstdrc.org/study?id=ped_opentargets_2021)
Pediatric Open Targets is a collaborative project between the National Cancer Institute and the Children's Hospital of Philadelphia. Through this project and as part of the NCI's Childhood Cancer Data Initiative, we are utilizing the harmonization work of the KidsFirst Data Resource Center and analytics work of OpenPBTA to build a pediatric preclinical pediatric platform to assist in development and query of the FDA's Relevant Molecular Targets List to identify new therapeutics for children with cancer. For updates, please see here: [Release Notes](https://tinyurl.com/55cxz9am)

New:

[Open Pediatric Cancer (OpenPedCan) Project](https://kf-strides-cbioportal-qa.kidsfirstdrc.org/study?id=ped_opentargets_2021)
[OpenPedCan](https://github.com/PediatricOpenTargets/OpenPedCan-analysis) is a collaborative project between the National Cancer Institute and the Children's Hospital of Philadelphia as part of the NCI's Childhood Cancer Data Initiative. Here, we harmonize pan-cancer data using [KidsFirst Data Resource Center](https://kidsfirstdrc.org/) workflows and harness [OpenPBTA](https://github.com/AlexsLemonade/OpenPBTA-analysis) analytics workflows to scale and add modules across pediatric cancer datasets. This data has been integrated into the pediatric open targets platform to assist in development and query of the FDA's Relevant Pediatric Molecular Targets List (PMTL) to identify new therapeutics for children with cancer. For study release details, please see [Release Notes](https://tinyurl.com/55cxz9am).

Let's also change the study name from ped_opentargets_2021 to open_ped_can

Within your notion page, we should capture all details of the repo release, data release, etc. Just another FYI as well - @taylordm said we can only add the portal link once it is live, so we will have to do another update later.

jharenza commented 2 years ago

Spotted by @adamcresnick - fusion genes are not both displaying at patient level, and as such, oncoKB designation is missing.

OpenPedCan: https://kf-strides-cbioportal-qa.kidsfirstdrc.org/patient?studyId=ped_opentargets_2021&caseId=PT_GQZ84ACS

This is also happening across the board with our studies. OpenPBTA: https://pedcbioportal.kidsfirstdrc.org/patient?studyId=openpbta&caseId=PT_1J2DT6MM

This is not happening in cbio: https://www.cbioportal.org/patient?sampleId=P-0001453-T01-IM3&studyId=blca_nmibc_2017

or with PPTC: https://pedcbioportal.kidsfirstdrc.org/patient?studyId=pptc&caseId=P0163

Seems like -- issue if pedcbio is parsing the fusion name using the hyphen...?

migbro commented 2 years ago

Ok, so I tested out using only a single hyphen on a smaller KF project. Before (using prod): https://pedcbioportal.kidsfirstdrc.org/patient?sampleId=PAUMTZ-09A-01&studyId=aml_sd_pet7q6f2_2018 After (on QA): https://kf-strides-cbioportal-qa.kidsfirstdrc.org/patient?sampleId=PAUMTZ-09A-01&studyId=aml_sd_pet7q6f2_2018 Seems to be an improvement, but I wonder if the repeat entries hack to get both genes to be searchable has made things weird. I'll try removing that...

migbro commented 2 years ago

Actually, another issue is if a gene symbol is not in there database, there is a chance the fusion might be skipped, so like AC022145.2-MLLT10. In the file it is:

AC022145.2      Fred Hutchinson Cancer Research Center  PAUMTZ-09A-01   AC022145.2-MLLT10   no  yes ARRIBA  other
MLLT10      Fred Hutchinson Cancer Research Center  PAUMTZ-09A-01   AC022145.2-MLLT10   no  yes ARRIBA  other

but when you look on QA how it loaded, only the MLLT10 line was used, and the gene name AC022145.2 ignored. So, perhaps the repeat lines are ok, but depending on how much time I have, this might be the best I can do.

migbro commented 2 years ago

Just a comment as an update. Aside from the hyphen issue, which we will adopt the single - separator to fix the display, the hack mentioned above seems useful to keep fusions that involve a gene and an intergenic region. It's an "ancient" problem that we can try and tackle better in the near future, but this is at least an improvement.

jharenza commented 2 years ago

https://kf-strides-cbioportal-qa.kidsfirstdrc.org/study/summary?id=ped_opentargets_2021 - looks good, sending to prod!

jharenza commented 2 years ago

closing since these specific tasks have been completed