FredHutch / Galeano-Nino-Bullman-Intratumoral-Microbiota_2022

Analysis code used in Galeano Nino et al., Impact of Intratumoral Microbiota on Spatial and Cellular Heterogeneity in human cancer. 2022
MIT License
33 stars 10 forks source link

Galeano-Nino-Bullman-Intratumoral-Microbiota-2022

DOI

Analysis code used in Galeano Nino et al., Effect of the intratumoral microbiota on spatial and cellular heterogeneity in cancer. 2022

The code in this repository is organized to reflect the description in the Methods section of Galeano Nino et al., Effect of the intratumoral microbiota on spatial and cellular heterogeneity in cancer. 2022.

10X Visium Scans for CRC and OSCC samples

10X Visium Scans associated with manuscript submission are uploaded to AWS and Zenodo.

Tiff files can be accessed via: https://fh-pi-bullman-s-eco-public.s3.us-west-2.amazonaws.com/DataTransfer/Galeano_Nino_et_al_visium_scans/CRC_OSCC_visium_tiff.tar.gz and https://doi.org/10.5281/zenodo.7419806

Please note for sample CRC_16, the slide id is V10S15-020 and area code is D1; for sample OSCC_2, the slide id is V11A07-022 and area code is A1.

We also uploaded fastq files to AWS for your convenience: https://fh-pi-bullman-s-eco-public.s3.us-west-2.amazonaws.com/DataTransfer/Galeano_Nino_et_al_visium_scans/CRC_OSCC_visium_fastq.tar.gz

Environment and Reference Data

Environment

All of the analysis code documented in this repository was run on the shared computing cluster maintained at the Fred Hutchinson Cancer Research Center between May 2020 and August 2022. The software dependencies used by these scripts are provided using the EasyBuild installation maintained by the Fred Hutch Scientific Computing group. Those software dependencies are loaded into the environment with the ml command (e.g. ml CellRanger/6.1.1).

Reference Data

Prior to running the analysis scripts, reference databases were downloaded for PathSeq (December 2020) and CellRanger (January 2022). The location of those reference databases is provided to the analysis scripts using the environment variables pathseqdb and cellrangerdb.

Overview of the Computational Pipeline for Bacteria-associated Spots/Cells Annotation

Part 1: 10x Visium spatial transcriptomic data

  1. Identification of microbial reads within 10x Visium spatial transcriptomic data generated by 10x Space Ranger Count (Visium_pipeline.sh)
  2. Bioinformatic analysis of 10x Visium spatial transcriptomic data (Visium.R)
  3. summarize numbers of bacteria reads and UMIs in 10X Visium data (validate_and_count.py) The folder used as outputs from the previous steps should be provided as an argument to the Visium_pipeline.sh script.

    Output Data:

    • CRC_16.visium.raw_matrix.genus.csv and OSCC_2.visium.raw_matrix.genus.csv contain bacteria UMI counting matrix that can be used as metadata in visium data process
    • CRC_16.visium.raw_matrix.validate.csv and OSCC_2.visium.raw_matrix.validate.csv contain validation data that can be used as the input of validate_and_count.py

Part 2: 10x Single cell data (For cell culture samples and patient samples)

Input Data:

Processing of single cell data

  1. Seurat data processing, Harmony integration, SingleR annotation and copyKAT predication (patient_samples_Seurat.r and cell_culture_Seurat.r)
  2. Differentially expression analysis and GSEA (DE.r)