d3b-center / ticket-tracker-OPC

A repo to generate and track tickets for ped OT
2 stars 0 forks source link

Exploratory analysis: Using GATK CNV in consensus CNV instead of MantaSV #401

Closed jharenza closed 2 years ago

jharenza commented 2 years ago

What analysis module should be updated and why?

We would like to determine what the differences are between using MantaSV and GATK CNV in the CNV consensus module. @sickler-alex has created two stacked PRs:

What changes need to be made? Please provide enough detail for another participant to make the update.

Some ideas for exploring the differences:

  1. Subset v11 consensus_wgs_plus_cnvkit_wxs.tsv.gz with cohorts mentioned above and compare results to those in 207. How much is the same, how much is new using GATK CNV, is there anything now missing?
  2. Rerun the following modules, in staggered PRs, using the new CNV files from 207 + 235
  3. Fork the OpenPBTA repo, subset the files from 207 + 235 to the OpenPBTA v22 PBTA cohort, and rerun oncoprint-landscape

What input data should be used? Which data were used in the version being updated?

From 207 + 235:

consensus_seg_with_status.tsv
consensus_wgs_plus_cnvkit_wxs.tsv.gz
consensus_wgs_plus_cnvkit_wxs_autosomes.tsv.gz
consensus_wgs_plus_cnvkit_wxs_x_and_y.tsv.gz

Otherwise, use v11 OpenPedCan

When do you expect the revised analysis will be completed?

2 weeks

Who will complete the updated analysis?

@adilahiri

adilahiri commented 2 years ago

Attached in the zip file are the plots from reruning the oncoprint landscape module. To generate these plots consensus_wgs_plus_cnvkit_wxs_autosomes.tsv.gzand consensus_wgs_plus_cnvkit_wxs_x_and_y.tsv.gz were used in place of consensus_seg_annotated_cn_autosomes.tsv.gzand consensus_seg_annotated_cn_x_and_y.tsv.gz. The percentage of alteration remained the same across all the plots. The module was run on EC2.

rerun-plots-oncoprint-landscape.zip

adilahiri commented 2 years ago

I am unable to run the module tp53_nf1_score, as I run the module I within the docker environment, I get the following error message.:

/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/__init__.py:145: RRuntimeWarning: Error: package or namespace load failed for ‘stats’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/usr/local/lib/R/library/stats/libs/stats.so':
  libRlapack.so: cannot open shared object file: No such file or directory

  warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/__init__.py:145: RRuntimeWarning: During startup - 
  warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/__init__.py:145: RRuntimeWarning: Warning message:

  warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/__init__.py:145: RRuntimeWarning: package ‘stats’ in options("defaultPackages") was not found 

  warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/robjects/pandas2ri.py:190: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), ...) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.
  res = PandasDataFrame.from_items(items)
Traceback (most recent call last):
  File "/home/rstudio/OpenPedCan-analysis/analyses/tp53_nf1_score/01-apply-classifier.py", line 67, in <module>
    exprs_df = pandas2ri.ri2py(exprs_rds)
  File "/usr/lib/python3.5/functools.py", line 745, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/usr/local/lib/python3.5/dist-packages/rpy2/robjects/pandas2ri.py", line 190, in ri2py_dataframe
    res = PandasDataFrame.from_items(items)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py", line 1883, in from_items
    keys, values = zip(*items)
ValueError: not enough values to unpack (expected 2, got 0)
adilahiri commented 2 years ago

For step 1 of this ticket: I subset the v11 file consensus_wgs_plus_cnvkit_wxs.tsv.gz with the required cohorts using the histologies.tsv file. I further filtered it using for pathology_diagnosis for Neuroblastoma and related terms: "Neuroblastoma"
"Ganglioneuroblastoma, intermixed"
"Ganglioneuroblastoma"
"Ganglioneuroma, maturing subtype OR Ganglioneuroblastoma, well differentiated"

The data was further filtered for the gene MYCN

Next part was to get the corresponding info from the PR, the files used were cnv_consensus.tsv,consensus_seg_annotated_cn_autosomes.tsv.gz. These PR files were joined with the histologies.tsvfile to get the pathology_diagnosis information. The PR data files followed a similar data filtering process as the v11 files.

The following plots were obtained for biospecimen that were common in the V11 and PR file.

copy_number_plot Ploidy status_call

jharenza commented 2 years ago

This can be closed following exploration by @kelseykeith and @adilahiri showing that the updated consensus calls using GATK CNV instead of Manta SV do not change CN calls of oncogenes and molecular subtypes for NBL, HGG, LGG, and embryonal tumors