d3b-center / hope-cohort-analysis

Analysis for HOPE cohort
3 stars 1 forks source link

Update tp53_nf1_score module (3/11) #106

Open komalsrathi opened 3 months ago

komalsrathi commented 3 months ago

Created this branch off of update-merge branch.

Data Preparation

The tp53_nf1_score module uses input files from data folder so I manually copied files that I generated by merge-files module to data/v3 and then soft-linked those files under data/ . So, these files point to v3:

Hope-cnv-controlfreec-tumor-only.rds -> v3/Hope-cnv-controlfreec-tumor-only.rds
Hope-cnv-controlfreec.rds -> v3/Hope-cnv-controlfreec.rds
Hope-fusion-putative-oncogenic.rds -> v3/Hope-fusion-putative-oncogenic.rds
Hope-gene-expression-rsem-tpm-collapsed.rds -> v3/Hope-gene-expression-rsem-tpm-collapsed.rds
Hope-gene-expression-rsem-tpm.rds -> v3/Hope-gene-expression-rsem-tpm.rds
Hope-tumor-only-snv-mutect2.maf.tsv.gz -> v3/Hope-tumor-only-snv-mutect2.maf.tsv.gz
Hope-gene-counts-rsem-expected_count-collapsed.rds -> v3/Hope-gene-counts-rsem-expected_count-collapsed.rds
Hope-gene-counts-rsem-expected_count.rds -> v3/Hope-gene-counts-rsem-expected_count.rds
Hope-snv-consensus-plus-hotspots.maf.tsv.gz -> v3/Hope-snv-consensus-plus-hotspots.maf.tsv.gz 
Hope-GBM-histologies-base.tsv -> v3/Hope-GBM-histologies-base.tsv

And, these point to v2:

release-notes.md -> v2/release-notes.md
Hope-sv-manta.tsv.gz -> v2/Hope-sv-manta.tsv.gz
Hope-methyl-m-values.rds -> v2/Hope-methyl-m-values.rds
Hope-methyl-beta-values.rds -> v2/Hope-methyl-beta-values.rds
Hope-and-CPTAC-GBM.splice-events-rmats.tsv.gz -> v2/Hope-and-CPTAC-GBM.splice-events-rmats.tsv.gz
Hope-and-CPTAC-GBM-gene-expression-rsem-tpm-collapsed.rds -> v2/Hope-and-CPTAC-GBM-gene-expression-rsem-tpm-collapsed.rds
Hope-GBM-histologies.tsv -> v2/Hope-GBM-histologies.tsv

Debugging scripts

When I was re-running the module using docker, I got a ValueError for the roc_auc_score function within the 06-evaluate-classifier.py script as follows:

ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

So I added a try-catch block and assigned 0 where this error is encountered. This fixed the error but could you check the outputs if it makes sense?

Additionally, there is no file results/gene-expression-rsem-tpm-collapsed-poly-A_classifier_scores.tsv so I had to remove the command that calls 06-evaluate-classifier.py on this file in the run_classifier.sh script. I replaced it with the command to run on results/gene-expression-rsem-tpm-collapsed-poly-A-stranded_classifier_scores.tsv because that file is present.

Output files

There were a lot of files not updated when running the bash script, and I am unsure why:

# updated files
-rw-r--r--  1 rathik  1768498755  11920 Jun 20 17:05 stranded_TP53_roc_threshold_results_shuffled.tsv
-rw-r--r--  1 rathik  1768498755  11911 Jun 20 17:05 stranded_TP53_roc_threshold_results.tsv
-rw-r--r--  1 rathik  1768498755    219 Jun 20 17:05 exome_capture_TP53_roc_threshold_results_shuffled.tsv
-rw-r--r--  1 rathik  1768498755    225 Jun 20 17:05 exome_capture_TP53_roc_threshold_results.tsv
-rw-r--r--  1 rathik  1768498755    683 Jun 20 17:05 polya_stranded_TP53_roc_threshold_results_shuffled.tsv
-rw-r--r--  1 rathik  1768498755    698 Jun 20 17:05 polya_stranded_TP53_roc_threshold_results.tsv
-rw-r--r--  1 rathik  1768498755  16848 Jun 20 17:05 tp53_altered_status.tsv
-rw-r--r--  1 rathik  1768498755     99 Jun 20 17:04 sv_overlap_tp53.tsv
-rw-r--r--  1 rathik  1768498755    503 Jun 20 17:04 fusion_bk_tp53_loss.tsv
-rw-r--r--  1 rathik  1768498755    112 Jun 20 17:00 loss_overlap_domains_tp53.tsv
-rw-r--r--  1 rathik  1768498755  10939 Jun 20 16:58 combined_scores.tsv
-rw-r--r--  1 rathik  1768498755   2109 Jun 20 16:58 gene-expression-rsem-tpm-collapsed-poly-A-stranded_classifier_scores.tsv
-rw-r--r--  1 rathik  1768498755    590 Jun 20 16:58 gene-expression-rsem-tpm-collapsed-exome_capture_classifier_scores.tsv
-rw-r--r--  1 rathik  1768498755  20711 Jun 20 16:57 gene-expression-rsem-tpm-collapsed-stranded_classifier_scores.tsv
-rw-r--r--  1 rathik  1768498755   9338 Jun 20 16:56 TP53_NF1_snv_alteration.tsv

# not updated
-rw-r--r--  1 rathik  1768498755  57694 Jan 10 10:47 tp53_score+reported_gender+mol_wo_tp53+binned_age.RDS
-rw-r--r--  1 rathik  1768498755  52425 Jan 10 10:47 tp53_score+reported_gender+mol_subtype_without_tp53+binned_age.RDS
-rw-r--r--  1 rathik  1768498755  58122 Jan 10 10:47 tp53_score+reported_gender+mol_subtype+binned_age.RDS
-rw-r--r--  1 rathik  1768498755  50948 Jan 10 10:47 tp53_score+reported_gender+broad_mol+binned_age.RDS
-rw-r--r--  1 rathik  1768498755  30707 Jan 10 10:47 HGG_H3WT_score+reported_gender+molecular_subtype+binned_age.RDS
-rw-r--r--  1 rathik  1768498755  23594 Jan 10 10:47 DGM_score+reported_gender+molecular_subtype+binned_age.RDS
-rw-r--r--  1 rathik  1768498755  24808 Jan 10 10:47 DGM_score+reported_gender+molecular_subtype*tp53+binned_age.RDS

Similarly, a lot of plots did not get updated as well:

# updated
-rw-r--r--  1 rathik  1768498755   30531 Jun 20 17:05 stranded_TP53_roc.png
-rw-r--r--  1 rathik  1768498755   24199 Jun 20 17:05 exome_capture_TP53_roc.png
-rw-r--r--  1 rathik  1768498755   24199 Jun 20 17:05 polya_stranded_TP53_roc.png

# not updated
-rw-r--r--  1 rathik  1768498755    6286 Jan 10 10:47 KM_reported_gender.pdf
-rw-r--r--  1 rathik  1768498755    7122 Jan 10 10:47 KM_race.pdf
-rw-r--r--  1 rathik  1768498755    7278 Jan 10 10:47 KM_mol_subtype_without_tp53.pdf
-rw-r--r--  1 rathik  1768498755    6783 Jan 10 10:47 KM_broad_mol.pdf
-rw-r--r--  1 rathik  1768498755    6860 Jan 10 10:47 KM_binned_tp53.pdf
-rw-r--r--  1 rathik  1768498755    6633 Jan 10 10:47 KM_binned_age.pdf
-rw-r--r--  1 rathik  1768498755  231538 Jan 10 10:47 HR_molecular_subtype.png
-rw-r--r--  1 rathik  1768498755  201062 Jan 10 10:47 HR_mol_subtype_without_tp53.png
-rw-r--r--  1 rathik  1768498755  176529 Jan 10 10:47 HR_broad_mol_subtype.png
-rw-r--r--  1 rathik  1768498755  159420 Jan 10 10:47 HR_HGG_WT.png
-rw-r--r--  1 rathik  1768498755  188041 Jan 10 10:47 HR_DGM_tp53_interaction.png
-rw-r--r--  1 rathik  1768498755  143568 Jan 10 10:47 HR_DGM.png
-rw-r--r--  1 rathik  1768498755   31620 Sep 11  2023 polya_TP53_roc.png
komalsrathi commented 1 month ago

I had to manually install this library txdbmaker

That's interesting. I think I ran all the modules on Docker and didn't face any issues. Let me check.

komalsrathi commented 1 month ago

I also reran this with the data.table::fread's tmpdir fix, luckily doesn't change any output files.

jharenza commented 1 month ago
Output created: 03-tp53-cnv-loss-domain.nb.html
03 tp53

processing file: 04-tp53-sv-loss.Rmd
  |............                                        |  24% [unnamed-chunk-2]Error in `load_package_gracefully()`:
! Could not load package txdbmaker. Is it installed?

  Note that starting with BioC 3.19, calling makeTxDbFromGFF() requires
  the txdbmaker package. Please install it with:

    BiocManager::install("txdbmaker")
Backtrace:
 1. GenomicFeatures::makeTxDbFromGFF(...)
 2. GenomicFeatures:::call_fun_in_txdbmaker("makeTxDbFromGFF", ...)
 3. GenomicFeatures:::load_package_gracefully(...)

Quitting from lines 42-74 [unnamed-chunk-2] (04-tp53-sv-loss.Rmd)

Execution halted

I did get the same thing as @naqvia - I think we do need to add this new package to the docker image since we did the 4.4 update.

komalsrathi commented 1 month ago

Will update once #118 is merged.

jharenza commented 1 month ago

@komalsrathi this now runs to completion with the new docker image, however I did notice that the sv_overlap_tp53.tsv file is now empty. This should come from the manta sv file, can you check the script pulling SVs?

komalsrathi commented 1 month ago

Are you talking about this file: https://github.com/d3b-center/hope-cohort-analysis/blob/update-nf1-score/analyses/tp53_nf1_score/results/sv_overlap_tp53.tsv ?

jharenza commented 1 month ago

Are you talking about this file: https://github.com/d3b-center/hope-cohort-analysis/blob/update-nf1-score/analyses/tp53_nf1_score/results/sv_overlap_tp53.tsv ?

yes, seems it is empty and affecting some losses

komalsrathi commented 1 month ago

Could be the same data.table issue, checking now.

komalsrathi commented 1 month ago

I deleted my v3 folder -> merged the master branch -> ran download script to get v3 files. As expected, only 1 file fails:

Checking MD5 hashes...
Hope-and-CPTAC-GBM-gene-expression-rsem-tpm-collapsed.rds: OK
Hope-and-CPTAC-GBM.splice-events-rmats.tsv.gz: OK
Hope-GBM-histologies.tsv: FAILED
Hope-methyl-beta-values.rds: OK
Hope-methyl-m-values.rds: OK
release-notes.md: OK
Hope-sv-manta.tsv.gz: OK
Hope-GBM-histologies-base.tsv: OK
Hope-cnv-controlfreec-tumor-only.rds: OK
Hope-cnv-controlfreec.rds: OK
Hope-fusion-putative-oncogenic.rds: OK
Hope-gene-counts-rsem-expected_count-collapsed.rds: OK
Hope-gene-counts-rsem-expected_count.rds: OK
Hope-gene-expression-rsem-tpm-collapsed.rds: OK
Hope-gene-expression-rsem-tpm.rds: OK
Hope-snv-consensus-plus-hotspots.maf.tsv.gz: OK
Hope-tumor-only-snv-mutect2.maf.tsv.gz: OK
md5sum: WARNING: 1 of 17 computed checksums did NOT match

Next, I manually soft-linked v3 files to data folder. Here is my data folder:

(base) ➜  d3b-hope-cohort-analysis git:(update-nf1-score) ls -lt data 
total 356736
lrwxr-xr-x   1 rathik  1768498755         30 Aug 21 15:52 Hope-methyl-beta-values.rds -> v3/Hope-methyl-beta-values.rds
lrwxr-xr-x   1 rathik  1768498755         28 Aug 21 15:52 Hope-cnv-controlfreec.rds -> v3/Hope-cnv-controlfreec.rds
lrwxr-xr-x   1 rathik  1768498755         36 Aug 21 15:52 Hope-gene-expression-rsem-tpm.rds -> v3/Hope-gene-expression-rsem-tpm.rds
lrwxr-xr-x   1 rathik  1768498755         39 Aug 21 15:52 Hope-cnv-controlfreec-tumor-only.rds -> v3/Hope-cnv-controlfreec-tumor-only.rds
lrwxr-xr-x   1 rathik  1768498755         41 Aug 21 15:52 Hope-tumor-only-snv-mutect2.maf.tsv.gz -> v3/Hope-tumor-only-snv-mutect2.maf.tsv.gz
lrwxr-xr-x   1 rathik  1768498755         46 Aug 21 15:52 Hope-gene-expression-rsem-tpm-collapsed.rds -> v3/Hope-gene-expression-rsem-tpm-collapsed.rds
lrwxr-xr-x   1 rathik  1768498755         48 Aug 21 15:52 Hope-and-CPTAC-GBM.splice-events-rmats.tsv.gz -> v3/Hope-and-CPTAC-GBM.splice-events-rmats.tsv.gz
lrwxr-xr-x   1 rathik  1768498755         23 Aug 21 15:52 Hope-sv-manta.tsv.gz -> v3/Hope-sv-manta.tsv.gz
lrwxr-xr-x   1 rathik  1768498755         43 Aug 21 15:52 Hope-gene-counts-rsem-expected_count.rds -> v3/Hope-gene-counts-rsem-expected_count.rds
lrwxr-xr-x   1 rathik  1768498755         60 Aug 21 15:52 Hope-and-CPTAC-GBM-gene-expression-rsem-tpm-collapsed.rds -> v3/Hope-and-CPTAC-GBM-gene-expression-rsem-tpm-collapsed.rds
lrwxr-xr-x   1 rathik  1768498755         46 Aug 21 15:52 Hope-snv-consensus-plus-hotspots.maf.tsv.gz -> v3/Hope-snv-consensus-plus-hotspots.maf.tsv.gz
lrwxr-xr-x   1 rathik  1768498755         53 Aug 21 15:52 Hope-gene-counts-rsem-expected_count-collapsed.rds -> v3/Hope-gene-counts-rsem-expected_count-collapsed.rds
lrwxr-xr-x   1 rathik  1768498755         27 Aug 21 15:52 Hope-GBM-histologies.tsv -> v3/Hope-GBM-histologies.tsv
lrwxr-xr-x   1 rathik  1768498755         27 Aug 21 15:52 Hope-methyl-m-values.rds -> v3/Hope-methyl-m-values.rds
lrwxr-xr-x   1 rathik  1768498755         37 Aug 21 15:52 Hope-fusion-putative-oncogenic.rds -> v3/Hope-fusion-putative-oncogenic.rds
lrwxr-xr-x   1 rathik  1768498755         32 Aug 21 15:52 Hope-GBM-histologies-base.tsv -> v3/Hope-GBM-histologies-base.tsv
drwxr-xr-x  20 rathik  1768498755        640 Aug 21 15:47 v3
-rw-r--r--   1 rathik  1768498755  131420160 Aug 20 19:36 txdb_from_gencode.v39.gtf.db
drwxr-xr-x  21 rathik  1768498755        672 Aug 17 08:40 tmp
drwxr-xr-x  20 rathik  1768498755        640 Aug 16 16:21 v2
-rw-r--r--   1 rathik  1768498755   47572349 Sep 15  2023 gencode.v39.primary_assembly.annotation.gtf.gz
drwxr-xr-x  20 rathik  1768498755        640 Sep 15  2023 v1

Finally, deleted all results and plots from this module -> reran classifier -> a couple files changed but I don't see the specific file you mentioned changing. Also deleted obsolete results and plots.