Closed jharenza closed 2 years ago
@migbro will work on this once the data are in place
Also note: please use the same columns (except remove cohort column) and nomenclature as with the OpenTargets project, having been derived from the histologies file v21, not the data service or D3b WH, so we are synced with our GitHub release. Thank you!
@migbro, what will your timeline for this be? thank you!
I'll start work on this this week
Hmmm, working on loading this project. I am trying to use the same sample ids I used for pbta_all
load for this one. For sample 7316-466
there were two RNA samples. For pbta_all
I loaded BS_SHZZ99DT
, which is totalRNAseq, ribo-depleted, but openPBTA used the polyA one. @jharenza I am guessing you will stick with that?
Also, the merged maf file is missing the typical #version 2.4
header line. I can easily work around that with a simple flag, but not sure if that is intentional.
@migbro, oh yes I forget to add the header line after merging the file - I can fix that.
For the RNA sample that you mentioned, this is weird since when I use the histology file - it looks like this sample only has one RNA-Seq, which is poly-A
and BS_0VXZCRJS
. And BS_SHZZ99DT
is not in the histology file or the gene expression RDS. Any idea how you got the sample loaded?
So, it's not that I had loaded the RDS. What I had done was use the pbta_all
data_sample_sheet to harmonize sample naming (it's a long story) in cBio so that DNA and RNA data can be tied together 1-to-1. That polyA version was not in the pbta_all
cBio study study, just the stranded ribo-depleted run of that same sample, as currently we are not loading technical replicates. So for now, I will just ignore that and load what you are actually using when there is a conflict.
@migbro, gotcha! Thanks! I have now uploaded the fixed SNV in the box now.
Ok, loading on to QA now 🎉, will update when completed. Also:
Needed to get rid of \ in pathology free text for BS_8ZS9F31R
This issue still remains! Luckily I have a note about that from when I did the openTARGET load, so I was able to avoid this.
@runjin326 it's up! https://kf-strides-cbioportal-qa.kidsfirstdrc.org/study/summary?id=openpbta Please take a look and let me know id you see anything odd
@migbro, thanks so much! Looks cool to me but @jharenza can better assess if there is anything odd ;)
I do notice fewer variant calls...that's because of the consensus method of using 3/3 + hotspots, right?
I would assume so - since I did confirm the number of rows being correct after merging the two MAF files.
Thanks! I'll review tonight 😀
Although, I forgot the seg file load I just realized. I can do that tomorrow.
Hmmm, working on loading this project. I am trying to use the same sample ids I used for
pbta_all
load for this one. For sample7316-466
there were two RNA samples. Forpbta_all
I loadedBS_SHZZ99DT
, which is totalRNAseq, ribo-depleted, but openPBTA used the polyA one. @jharenza I am guessing you will stick with that?
hey @migbro - I had to do some digging, but this sample is actually one of those "polyA + stranded" libraries that BGI sequenced in error, as discussed in this ticket , and which were added in release v12, but subsequently taken out in v13 :https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/373 - So, 1) if they are still visible in the DS (should be ok), we definitely need to annotate them as poly A prep AND stranded prep and 2) you probably should just use the polyA bs_id for the pbta_all
study.
Hi @migbro - I reviewed this. Can you add to column experimental_strategy
to the front page, which would denote whether the DNA is WGS or WXS? Our oncoprints mostly match for:
but pretty off for diffuse astrocytic tumors - I think this is because I cannot select WGS only samples, which I need to do here to mimic OpenPBTA (we'll see).
Minor edit for the description: add a space between Pacific Pediatric Neuro-oncology Consortium
and through
.
Thank you!
Hmmm, working on loading this project. I am trying to use the same sample ids I used for
pbta_all
load for this one. For sample7316-466
there were two RNA samples. Forpbta_all
I loadedBS_SHZZ99DT
, which is totalRNAseq, ribo-depleted, but openPBTA used the polyA one. @jharenza I am guessing you will stick with that?hey @migbro - I had to do some digging, but this sample is actually one of those "polyA + stranded" libraries that BGI sequenced in error, as discussed in this ticket , and which were added in release v12, but subsequently taken out in v13 :#373 - So, 1) if they are still visible in the DS (should be ok), we definitely need to annotate them as poly A prep AND stranded prep and 2) you probably should just use the polyA bs_id for the
pbta_all
study.
So what you are saying is that the one you currently have in the histologies file, you think that is that best representative (BS_0VXZCRJS
), and the one I have in pbta_all
ought to be replaced with that one, instead of the one I used, BS_SHZZ99DT
? There were actually 39 that I saw in openPBTA, but not pbta_all
. 14 of which were because the library was listed in data service as polyA and we preferred ribo-depleted, stranded. Attached I have that information:
in_openpbta_not_pbta_all.csv
pbta_all_used.csv
@runjin326 when you have a chance, can you give me a brief description of how the consensus maf is made? For instance, for D3b we have:
Consensus calls from strelka2, mutect2, lancet, and VarDict Java. Two or more callers required to pass, < 0.001 frequency in gnomAD, and min read depth 8 in normal sample"
Although , I realize I forgot to mention hotspots in my description
Nm, I found this: https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/snv-callers and remembered all that hot spot work, so I came up with:
Consensus calls from strelka2, mutect2, lancet. All three callers must agree unless the variant falls in a TERT promoter or hotspot region (see https://www.cancerhotspots.org)
If that's not correct, just let me know!
Hi @migbro - is this for release notes? Can you just link to the consensus and hotspot modules within OpenPBTA? You will also have to do this with CNVs for consensus and Fusions for putative oncogenic. There are a few nuances and the very brief description may not suffice.
It's more for the study meta files. for instance:
cancer_study_identifier: openpbta
stable_id: mutations
profile_name: Mutations
profile_description: Consensus calls from strelka2, mutect2, lancet. All three callers must agree unless the variant falls in a TERT promoter or hotspot region (see https://www.cancerhotspots.org)
genetic_alteration_type: MUTATION_EXTENDED
datatype: MAF
show_profile_in_analysis_tab: true
data_filename: data_mutations_extended.txt
Each data type has this corresponding file. I can't remember where one sees these descriptions...
Ok, I have updated the study with experimental strategy and fixed the descirption typos.
@jharenza ok, I have updated the study. If there is nothing else, I shall push it to prod
@migbro, I am in the process of checking the CNV file loaded and there might be an update on the file being used. I will ping you once I complete that. Maybe we can wait to push to prod? Thanks!
Description
This template is used to start a request to load or update a study onto the Kids First PedcBioPortal
Common - any new study REQUIRED
If this is the first time being loaded, please fill out the cbio meta_study info, for instance:
type_of_cancer
can be found here: https://github.com/kids-first/kf-cbioportal-etl/blob/master/REFS/type_of_cancer and are based on OncoTree or add a new one with this mechanismcancer_study_identifier
should be in the form of<type_of_cancer>_<study_id>_year
name
should be like a study title with a couple key wordsdescription
should be a like a short abstract with links, but mind the 1024 character limit!Provide an example of a sample ID that can be used to tie together DNA and RNA (if applicable), aka a "somatic event ID":
Load/access control:
Kids First/PBTA
BS_ID
:Publication/Collaboration
Publication is obvious, as
Collaboration
study would be something like OpenPBTA, OpenTargets, or other custom requestPlease provide the following:
Data will be added here:
s3://kf-openaccess-us-east-1-prd-pbta/data/pedcbio/
by @runjin326Patient metadata that is available
s3://kf-openaccess-us-east-1-prd-pbta/data/pedcbio/pbta-histologies.tsv
(v21)Sample metadata that is available
QA Review
Revisit this section once the project is loaded onto to QA as a minimum push-to-prod and/or close-ticket checklist
GROUP
where allowed. Places to look are typically patient ID columns, sample ID columns, generally clinical data sections