FertigLab / PanIN_carcinogeneisis_spatial_analysis

GNU General Public License v3.0
0 stars 0 forks source link

PanIN and CAF Transitions in Pancreatic Carcinogenesis Revealed with Spatial Data Integration

Alexander T.F. Bell, Jacob T. Mitchell, Ashley L. Kiemen, Melissa Lyman, Kohei Fujikura, Jae W. Lee, Erin Coyne, Sarah M. Shin, Sushma Nagaraj, Atul Deshpande, Pei-Hsun Wu, Dimitrios N. Sidiropoulos, Rossin Erbe, Jacob Stern, Rena Chan, Stephen Williams, James M. Chell, Lauren Ciotti, Jacquelyn W. Zimmerman, Denis Wirtz, Won Jin Ho, Neeha Zaidi, Elizabeth Thompson, Elizabeth M. Jaffee, Laura D. Wood, Elana J. Fertig, Luciane T. Kagohara

Abstract

This study introduces a novel artificial intelligence method integrating imaging, spatial transcriptomics, and single-cell RNA-sequencing (scRNA-seq) data to characterize neoplastic cell state transitions during tumorigenesis. This pipeline was applied to examine pancreatic intraepithelial neoplasias (PanIN), one of the premalignant lesions that potentially develop into pancreatic adenocarcinoma (PDAC). Previous characterization of PanINs within their microenvironment has been limited by their strict diagnosis on FFPE tissues. To overcome such limitation, we developed a new pipeline for unbiased whole transcriptome FFPE spatial profiling of PanINs that uses machine learning to classify and deconvolve the spatial transcriptomics spots, and to further integrate the spatial data with a PDAC scRNA-seq dataset. Our new integrated computational analysis method finds that cancer associated fibroblasts (CAF), including antigen-presenting CAFs, are located in close proximity to PanINs. We observed a transition from CAF-related inflammatory signaling to cellular proliferation during PanIN progression, and confirmed this finding with high-dimensional imaging proteomics and transcriptomics technologies. Altogether, this spatial multi-omics characterization provides a reference for future PanIN studies. The convergence of computational methods and technology development to decipher the spatiotemporal dynamics in this precancer atlas has broad applicability pan-cancer.

Pipeline Recreation

Data Acquisition

All file paths used in analysis scripts for this projects a referential to the parent directory where PanIN_carcinogenesis_spatial_analysis.Rproj is stored. These sub-directories must be created ahead of running the scripts to store raw data passed into the pipeline and to store files and images generated as a product of the pipeline. The scripts are intended to be run in the order that they are numbered.

Within each of these sub-directories, files are stored in directories named after the script that generated the file.

Scripts

For compatible software and package versions, please see the .html files for each vignette.

Processing and analysis of original (paired) Visium cohort

Execute scripts in the order that they are numbered.

scripts/visium_analysis/original_panin_cohort

00_PanIN_Custom_Functions.R

Contains several custom/modified functions used throughout the analysis, including SpatialDimChoose() which was used to manually annotate epithelial spots based on histologic grade.

01_Pre_processing_paired_cohort.Rmd

Imports and merges the original PanIN cohort Visium data into a single Seurat object. Contains all of the steps used for pre-processing the Visium data. Imports CODA annotations and integrates them with Visium data.

02_Paired_cohort_visium_analysis.R

Includes the main analyses done directly on the processed Visium data. This includes comparing the cellular composition of the Louvain clusters and their marker genes pre- and post-CODA filtration, differential expression analysis between PanIN and normal ductal epithelium and between high- and low-grade PanIN, pathway analysis of PanIN vs normal duct DEGs, module scores for classical/basal-like signature, panCAFs, iCAFs, myCAFs, apCAFs, CSCs. Generates figures 2B-2F, 4A-4E, 5E-5H, supplemental figures S10A-S10D, S2-S9, S14-S16.

03_Paired_cohort_visium_CoGAPS_and_transfer_to_atlas.R

Uses non-negative matrix factorization (CoGAPS) to learn transcriptional patterns from the CODA-purified epithelial cells in the original seven-tissue PanIN cohort. Compares these patterns between high-grade PanIN, low-grade PanIN, and normal ductal epithelium. Projects these patterns onto a single cell atlas of PDAC. Generates figures 6C,6D,6F, Supplemental 19A-19C.

04_AtlasToSpatial_TransferLearning.Rmd

Assesses the projection of transcriptional patterns learned from epithelial cells in a single-cell atlas of PDAC to PanIN lesions identified by histopathological characteristics. Generates figures 6B, 6E. Generates supplemental figures 18A-18D.

05_Limited_Feature_Projection.Rmd

Verifies the integrity of projection of transcriptional patterns onto a limited gene feature set in the Xenium probe panel by assessing congruence between the use of the full gene set when projecting between the single-cell and spatial data and projection solely onto the gene features included in the Xenium panel.

06_cluster_highlight_plots.Rmd

Generates supplemental figure S3A.

Validation of Pattern Projection results in a cohort of resected primary PDAC tumors with concurrent PanIN

scripts/visium_analysis/extened_panin_cohort

Execute scripts in the order that they are numbered. Run lines 1-95 of "01_Pre_processing_paired_cohort.R" at least once before running extended PanIN scripts.

01_Read_Segments_Normalize_and_Scale.R

Read SpaceRanger outputs into R as Seurat objects. Expression values are normalized and scaled using Seurat's SCTransform algorithm. Spatially variable features are identified considering spot location.

02_Scale_Expression_and_Cluster_in_Segment.Rmd

Calculate PCA and UMAP embeddings for the Visium spots. Cluster spots by Leiden clustering to assess if transcriptional clusters follow histologic features of the tissue segments. Comments include the histologic features associated with each cluster.

03_Add_CODA_annotations.Rmd

Annotate spots by the predominant tissue type identified through CODA provided as an Excel spreadsheet.

04_Pathologist_Annotations.Rmd

After review of the spots with the team pathologist, spots comprised of PanIN were graded as low-grade or high-grade or were revised to the annotation provided by the pathologist. This script also includes the exclusion of spots representing creased tissue overlaps selected with the LOUPE browser tool, exclusion of spots predominated by adipose tissue, and exclusion of tissue fragments that had broken off from the primary segment laid upon the Visium slide.

05_PanIN_Validation_Cohort.Rmd

Conduction of the analysis pipeline outlined in scripts from the 'scripts/visium_analysis' directory upon the extended PanIN cohort. Segments are integrated into a single Seurat object with embeddings corrected for batch, in the form of separate Visium slides for each subject, using Harmony. Analysis consists of calculating module scores for PDAC sub types, cancer stem cells, and CAF subtypes; MAST differential expression tests between grades of PanIN lesions; and projection of patterns learned by CoGAPS from a single cell atlas of PDAC onto the epithelial spots.

Validation of Epithelial Cell States at Single-cell Resolution by Xenium

scripts/xenium_analysis

01_Load_Xenium_Data.Rmd

Loads the 5 sections of Xenium data into R as a Seurat Object. Conducts quality control, normalization, and clustering on the unified Seurat object.

02_Pattern_Projection.Rmd

Projects the transcriptional patterns learned from the PDAC atlas onto the expression data from the Xenium section. Generates figures 6I & 6J.

03_CAF_Typing_by_moduleScore.Rmd

Identifies cancer associated fibroblasts (CAFs) based on module scores for CAFs and functional subtypes of CAFs (apCAFs, iCAFs, myCAFs). Generates figures 3A - 3G, & 6H.