AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
99 stars 66 forks source link

Update generate figures #1286

Closed sjspielman closed 2 years ago

sjspielman commented 2 years ago

This PR addresses Issue #1261 and re-organizes figures/generate_figures.sh in order of figures as they will appear in the manuscript.

Deprecated items were removed from the generation script, and figures missing from the script were added in.

sjspielman commented 2 years ago

@jaclyn-taroni this is ready for a look, if not a full review..

        modified:   chromothripsis/results/shatterseek_results_per_chromosome.txt
    modified:   analyses/tp53_nf1_score/input/consensus_seg_with_status.tsv
    modified:   analyses/tp53_nf1_score/results/pbta-gene-expression-rsem-fpkm-collapsed.polya_classifier_scores.tsv
    modified:   analyses/tp53_nf1_score/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded_classifier_scores.tsv
    modified:   analyses/tp53_nf1_score/results/polya_TP53_roc_threshold_results.tsv
    modified:   analyses/tp53_nf1_score/results/polya_TP53_roc_threshold_results_shuffled.tsv
    modified:   analyses/tp53_nf1_score/results/stranded_TP53_roc_threshold_results.tsv
    modified:   analyses/tp53_nf1_score/results/stranded_TP53_roc_threshold_results_shuffled.tsv
    modified:   analyses/tp53_nf1_score/results/tp53_altered_status.tsv
    modified:   analyses/tp53_nf1_score/results/tp53_scores_vs_molecular_subtype_Diffuse_astrocytic_and_oligodendroglial_tumor.tsv
    modified:   analyses/tp53_nf1_score/results/tp53_scores_vs_molecular_subtype_Embryonal_tumor.tsv
    modified:   analyses/tp53_nf1_score/results/tp53_scores_vs_molecular_subtype_Ependymal_tumor.tsv
    modified:   analyses/tp53_nf1_score/results/tp53_scores_vs_molecular_subtype_Low-grade_astrocytic_tumor.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/kallisto_polyA_log_pca_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/kallisto_polyA_log_tsne_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/kallisto_polyA_log_umap_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/kallisto_polyA_none_pca_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/kallisto_polyA_none_tsne_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/kallisto_polyA_none_umap_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/kallisto_stranded_log_pca_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/kallisto_stranded_log_tsne_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/kallisto_stranded_log_umap_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/kallisto_stranded_none_pca_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/kallisto_stranded_none_tsne_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/kallisto_stranded_none_umap_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/rsem_polyA_log_pca_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/rsem_polyA_log_tsne_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/rsem_polyA_log_umap_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/rsem_polyA_none_pca_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/rsem_polyA_none_tsne_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/rsem_polyA_none_umap_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/rsem_stranded_log_pca_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/rsem_stranded_log_tsne_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/rsem_stranded_log_umap_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/rsem_stranded_none_pca_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/rsem_stranded_none_tsne_scores_aligned.tsv
    modified:   analyses/transcriptomic-dimension-reduction/results/rsem_stranded_none_umap_scores_aligned.tsv

The tp53_nf1_score diffs are mostly numerical tolerance, e.g. from analyses/tp53_nf1_score/input/consensus_seg_with_status.tsv:

# Old line
BS_6GV08HTE    chr7    138855546       140791770       NA      0.396711        3       2       gain
# New line
BS_6GV08HTE     chr7    138855546       140791770       NA      0.39671100000000004     3       2       gain

But some are not, eg from analyses/tp53_nf1_score/results/pbta-gene-expression-rsem-fpkm-collapsed.polya_classifier_scores.tsv:

# Old line
BS_0VXZCRJS    0.7193414470702985      0.2390159763881984      0.7694147776801358      0.7145130845110855      0.5351872429322704      0.41300529552370535

# New line
BS_0VXZCRJS    0.7193414470702985      0.2390159763881984      0.7694147776801357      0.4702415413857321      0.6709645060773562      0.43131321436036846

For the transcriptomic reduction, the diffs are kind of all over the place and suggest input data has changed. For example, in analyses/transcriptomic-dimension-reduction/results/kallisto_polyA_log_pca_scores_aligned.tsv several brand new columns are added into the TSV that weren't there before, but the PCs look the same. Overall this suggests to me this module in general needs to be re-run.

Notably, while this script catches most of the MS modules, a lot of modules (eg molecular subtyping!) relevant to the paper aren't part of the figure generation. I wonder if we might want to have two separate scripts for a Big Run - one to first run all analysis modules in the paper, and then one to just generate the figures. This would move out all module runs from generate-figures.sh into a new script (analyses/run-analyses.sh? figures/prepare-analyses.sh?)

sjspielman commented 2 years ago

Going more carefully through some of these diffs now, I am concerned about what I'm seeing with the tp53_nf1_score module. The ROC curves themselves have changed, as well as some tp53 expression - https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/e635974914a0de892cc537b9bd5be79e0b464191/analyses/tp53_nf1_score/plots/stranded_TP53_roc.png https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/e635974914a0de892cc537b9bd5be79e0b464191/analyses/tp53_nf1_score/plots/tp53_expression_by_altered_status_stranded.png https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/e635974914a0de892cc537b9bd5be79e0b464191/analyses/tp53_nf1_score/plots/tp53_scores_by_altered_status.png

jharenza commented 2 years ago

Going more carefully through some of these diffs now, I am concerned about what I'm seeing with the tp53_nf1_score module. The ROC curves themselves have changed, as well as some tp53 expression - https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/e635974914a0de892cc537b9bd5be79e0b464191/analyses/tp53_nf1_score/plots/stranded_TP53_roc.png https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/e635974914a0de892cc537b9bd5be79e0b464191/analyses/tp53_nf1_score/plots/tp53_expression_by_altered_status_stranded.png https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/e635974914a0de892cc537b9bd5be79e0b464191/analyses/tp53_nf1_score/plots/tp53_scores_by_altered_status.png

@sjspielman I found an issue with the TP53 module here, so this may be related.

jaclyn-taroni commented 2 years ago

I'm breaking out rerunning transcriptomic-dimension-reduction into its own pull request in the interest of examining that, but I'll think on this point:

I wonder if we might want to have two separate scripts for a Big Run - one to first run all analysis modules in the paper, and then one to just generate the figures. This would move out all module runs from generate-figures.sh into a new script (analyses/run-analyses.sh? figures/prepare-analyses.sh?)

sjspielman commented 2 years ago

Closing in favor of #1454