PANOPLY is a platform for applying state-of-the-art statistical and machine learning algorithms to transform multi-omic data from cancer samples into biologically meaningful and interpretable results. PANOPLY leverages Terra—a cloud-native platform for extreme-scale data analysis, sharing, and collaboration—to host proteogenomic workflows, and is designed to be flexible, automated, reproducible, scalable, and secure. A wide array of algorithms applicable to all cancer types have been implemented, and available in PANOPLY for analysis of cancer proteogenomic data.
Figure 1. Overview of PANOPLY architecture and the various tasks that constitute the complete workflow. Tasks can also be run independently, or combined into custom workflows, including new tasks added by users. Inputs to PANOPLY consists of (i) externally characterized genomics and proteomics data (in gct format); (ii) sample phenotype and annotations (in csv format); and (iii) parameters settings (in yaml format). Panoply modules are grouped into Data Preparation Modules (green box), Data Analysis Modules (blue box) and Report Modules (red box). Data preparation modules perform quality checks on input data followed by optional normalization and filtering for proteomics data. Analysis ready data tables are then used as inputs to the data analysis modules. Results from the data analysis modules are summarized in interactive reports generated by appropriate report modules. Modules under development are listed in grey text. |
PANOPLY consists of the following components:
PANOPLY provides a comprehensive collection of proteogenomic data analysis methods including sample QC (sample quality evaluation using profile plots and tumor purity scores (1), identify sample swaps, etc.), association analysis, RNA and copy number correlation (to proteome), connectivity map analysis (1 ,2), outlier analysis using BlackSheep (3), PTM-SEA (4), GSEA (5) and single-sample GSEA (6), consensus clustering, and multi-omic clustering using non-negative matrix factorization (NMF). Most analysis modules include a report generation task that outputs a HTML interactive report summarizing results from the respective analysis tasks. Mass spectrometry-based proteomics data amenable to analysis by PANOPLY includes isobaric label-based LC-MS/MS approaches like iTRAQ, TMT and TMTPro profiling the proteome and multiple PTM-omes including phospho-, acetyl-and ubiquitylomes.
Users can also customize or create new pipelines and add their own tasks and integrate them into the PANOPLY.
Email proteogenomics@broadinstitute.org with questions, comments or feedback.
Mani, D. R. et al. PANOPLY: a cloud-based platform for automated and reproducible proteogenomic data analysis. Nature Methods 1–3 (2021) doi:10.1038/s41592-021-01176-6.