d3b-center / ticket-tracker-OPC

A repo to generate and track tickets for ped OT
2 stars 0 forks source link

Merge kfnbl+pbta files #24

Closed jharenza closed 3 years ago

jharenza commented 3 years ago

What data file(s) does this issue pertain to?

all files

What release are you using?

v3

Put your question or report your issue here.

Consensus will be d3b 2/4+hotspots and I think this still needs to be run on KF NBL.

Merge and add to v4 bucket: s3://kf-openaccess-us-east-1-prd-pbta/open-targets/v4/

merge these files new filename
kfnbl-gene-counts-rsem-expected_count.stranded.rds, pbta-gene-counts-rsem-expected_count.polya.rds, pbta-gene-counts-rsem-expected_count.stranded.rds, target-counts-rsem-expected_count.rds, tcga-counts-rsem-expected_count.rds, gtex-counts-rsem-expected_count.rds gene-counts-rsem-expected_count.rds
kfnbl-gene-expression-rsem-tpm.stranded.rds, pbta-gene-expression-rsem-tpm.polya.rds, pbta-gene-expression-rsem-tpm.stranded.rds gene-expression-rsem-tpm.rds
kfnbl-fusion-arriba.tsv.gz, pbta-fusion-arriba.tsv.gz fusion-arriba.tsv.gz
kfnbl-fusion-starfusion.tsv.gz, pbta-fusion-starfusion.tsv.gz fusion-starfusion.tsv.gz
kfnbl-snv-consensus-mutation.maf.tsv.gz, pbta-snv-consensus-mutation.maf.tsv.gz snv-consensus-plus-hotspots.maf.tsv.gz
pbta-histologies.tsv, target-histologies.tsv, tcga-histologies.tsv, kfnbl-histologies.tsv, gtex-histologies.tsv histologies.tsv

Who will complete this?

@zhangb1 for data files @jharenza will upload histologies files from @ewafula

jharenza commented 3 years ago

cc @yuankunzhu and @migbro

jharenza commented 3 years ago

For the V19 pbta-histlogies.tsv file, I made a new column called cancer_group using the following code:

v19_2 <- v19 %>%
  dplyr::mutate(cancer_group = str_extract(harmonized_diagnosis, "[^,]*")) %>%
  dplyr::mutate(cancer_group = str_extract(cancer_group, "[^(]*")) %>%
  dplyr::mutate(cancer_group = case_when(cancer_group == "High-grade glioma/astrocytoma " ~ "High-grade glioma/astrocytoma", 
                                         cancer_group == "Low-grade glioma/astrocytoma " ~ "Low-grade glioma/astrocytoma", 
                                         cancer_group == "Dysembryoplastic neuroepithelial tumor " ~ "Dysembryoplastic neuroepithelial tumor",
                                         cancer_group == "Brainstem glioma- Diffuse intrinsic pontine glioma" ~ "Diffuse intrinsic pontine glioma",
                                         cancer_group == "Atypical Teratoid Rhabdoid Tumor " ~ "Atypical Teratoid Rhabdoid Tumor",
                                         TRUE ~ as.character(cancer_group))) %>%
  filter(experimental_strategy == "RNA-Seq")

This file was given to @ewafula for merging with the others.

Edited: add language highlight for the code

migbro commented 3 years ago

gene-counts-rsem-expected_count.rds and gene-expression-rsem-tpm.rds is up. gtex and tcga files for expected counts was not there, but is for tpm. However, since the table entry for tpm did not list to add tcga and gtex yet, I have left that out.

migbro commented 3 years ago

fusion-arriba.tsv.gz and fusion-starfusion.tsv.gz are also now up, with an updated md5sum file. Consensus is on hold until run for kfnbl set

jharenza commented 3 years ago

thanks @migbro !

jharenza commented 3 years ago

@migbro @zhangb1 for the above remaining MAF files - have we run KF NBL through consensus yet? If not, what is the timeline? For PBTA (CBTN+PNOC), do we have a consensus MAF to which we can append the KF NBL consensus to for release? You can use the v4 histologies file for BS IDs which are included - select for cohort == CBTN | PNOC | GMKF.

zhangb1 commented 3 years ago

@migbro @zhangb1 for the above remaining MAF files - have we run KF NBL through consensus yet? If not, what is the timeline? For PBTA (CBTN+PNOC), do we have a consensus MAF to which we can append the KF NBL consensus to for release? You can use the v4 histologies file for BS IDs which are included - select for cohort == CBTN | PNOC | GMKF.

I can manage that ,since we already have the 4 callers results. I can use the latest consensus app to do that , should be done by today or tomorrow.

jharenza commented 3 years ago

Thank you @zhangb1 !

zhangb1 commented 3 years ago

@jharenza I finished kfnbl consensus maf and able to merge them to kfnbl-snv-consensus-mutation.maf.tsv.gz.

do you want to merge it to pbta consensus merged file? both are CHOP method right? just to confirm.

jharenza commented 3 years ago

do you want to merge it to pbta consensus merged file? both are CHOP method right?

yes please, for PBTA - to get bs_ids, you can use the v4 histologies.tsv and pull WGS/WXS/Panel experimental_strategy and cohort == CBTN or PNOC

kgaonkar6 commented 3 years ago

@zhangb1 just wanted to circle back about the consensus file snv-consensus-plus-hotspots.maf.tsv.gz is this on s3 already ?

zhangb1 commented 3 years ago

@zhangb1 just wanted to circle back about the consensus file snv-consensus-plus-hotspots.maf.tsv.gz is this on s3 already ?

Not yet, I switch the work to the Open Target, but I can manager that , I will upload that in 2 days. sorry about that

kgaonkar6 commented 3 years ago

Thanks for the update 👍

zhangb1 commented 3 years ago

@kgaonkar6 @jharenza

snv-consensus-plus-hotspots.maf.tsv.gz
kfnbl-snv-consensus-mutation.maf.tsv.gz

are updated in the s3://kf-openaccess-us-east-1-prd-pbta/open-targets/v5/ folder

jharenza commented 3 years ago

done with #35 - will open new for v6