AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
100 stars 67 forks source link

Add v21 combined data matrices to s3 for pedcbio load #1186

Closed jharenza closed 3 years ago

jharenza commented 3 years ago

Merge the following files from V21 and add to s3://kf-openaccess-us-east-1-prd-pbta/data/pedcbio/ for @migbro to load to pedcbio

Somatic variant calls: Merge pbta-snv-consensus-mutation.maf.tsv.gz and pbta-snv-scavenged-hotspots.maf.tsv.gz and deduplicate (some pbta-snv-scavenged-hotspots.maf.tsv.gz are contained within pbta-snv-consensus-mutation.maf.tsv.gz

Copy number calls: Merge consensus_seg_annotated_cn_autosomes.tsv.gz and consensus_seg_annotated_cn_x_and_y.tsv.gz pbta-cnv-consensus.seg.gz already in bucket

RNAseq expression: Merge and collapse:

pbta-gene-expression-rsem-tpm.polya.rds
pbta-gene-expression-rsem-tpm.stranded.rds

RNAseq fusion pbta-fusion-putative-oncogenic.tsv is already in the bucket

@runjin326 will you work on this please?

runjin326 commented 3 years ago

@jharenza , @migbro, the 3 combined files are uploaded to the bucket now. I also wrote down the steps in this script for reference.