AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
99 stars 66 forks source link

v23 run generate analysis files (1/n) #1631

Closed jharenza closed 1 year ago

jharenza commented 1 year ago

Purpose/implementation Section

What scientific question is your analysis addressing?

Prep for v23 release

What was your approach?

Run generate-analysis-files.sh for V23

What GitHub issue does your pull request address?

NA

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Will look into the changes in files to see if something is strikingly wrong or maybe a change was missed previously. Update: these look ok, mainly due to updates in independent specimens and notebook HTML files. The cnv_consensus.tsv file has changed, but the final CNV files used in release do not change:

fa0adcd26f408d840f25339d2a1ea09f  consensus_seg_annotated_cn_autosomes.tsv.gz
9589cd18d0e6c1f7c2d939126c63ec6c  consensus_seg_annotated_cn_x_and_y.tsv.gz
b8b97483b4d65e65c8ae34ff89b3ef94  consensus_seg_with_status.tsv
b9284650be04df3538e6c6dba29b8eb0  pbta-cnv-consensus.seg.gz

I am going to investigate the SNV consensus maf changes, but my hunch is the row ordering is the cuplrit because I would also not expect those to change.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

yes

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Files being updated in release:

fusion_summary_ependymoma_foi.tsv (expected with #1619)
ae559544f5e8baf0f0f21ab7c00f0041  independent-specimens.rnaseq.primary-plus-stranded.tsv
44b1e1d483465798221799cf63b93fb8  independent-specimens.wgs.primary-plus.tsv
43bf4bd1f0d073607276ce6eef989951  independent-specimens.wgs.primary.tsv
372db726c453efded2340da8ad536e81  independent-specimens.wgswxs.primary-plus.tsv
94283581188cc87427b3b58b1fe75860  independent-specimens.wgswxs.primary.tsv
d4251fcd7f7bea0f64a9a247a40a21e0  pbta-cnv-consensus-gistic.zip
757159a9d864d78ef65c8b68453e2f86  pbta-fusion-recurrently-fused-genes-byhistology.tsv
95d6b0c3401f8c6c10c4c013cb78e275  pbta-fusion-recurrently-fused-genes-bysample.tsv
2be7929f8fc130fc2048cf8f8d0b1c55  pbta-snv-consensus-mutation.maf.tsv.gz
21126513a05c43427af774884aaeeb46  tcga-snv-consensus-snv.maf.tsv.gz

Reproducibility Checklist

Documentation Checklist

jaclyn-taroni commented 1 year ago

@jharenza the CI file-related updates should go into a different PR (analyses/create-subset-files/create_subset_files.sh and analyses/create-subset-files/biospecimen_ids_for_subset.RDS)

jharenza commented 1 year ago

@jharenza the CI file-related updates should go into a different PR (analyses/create-subset-files/create_subset_files.sh and analyses/create-subset-files/biospecimen_ids_for_subset.RDS)

Oh let me remove those, I think that was leftover from my testing that script in a different branch

jharenza commented 1 year ago

unzipped mafs are identical:

harenzaj@38f9d38f36c9 data % md5sum release-v22-20220505/*snv-consensus*maf.tsv   
337cc86a1c62eb2cef3cc9d8669c2eda  release-v22-20220505/pbta-snv-consensus-mutation.maf.tsv
f979aad447c9f8bcbac70b1fd2270a73  release-v22-20220505/tcga-snv-consensus-snv.maf.tsv
harenzaj@38f9d38f36c9 data % md5sum release-v23-20230115/*snv-consensus*maf.tsv
337cc86a1c62eb2cef3cc9d8669c2eda  release-v23-20230115/pbta-snv-consensus-mutation.maf.tsv
f979aad447c9f8bcbac70b1fd2270a73  release-v23-20230115/tcga-snv-consensus-snv.maf.tsv