Analysis code used in Galeano Nino et al., Effect of the intratumoral microbiota on spatial and cellular heterogeneity in cancer. 2022
The code in this repository is organized to reflect the description in the Methods section of Galeano Nino et al., Effect of the intratumoral microbiota on spatial and cellular heterogeneity in cancer. 2022.
10X Visium Scans associated with manuscript submission are uploaded to AWS and Zenodo.
Tiff files can be accessed via: https://fh-pi-bullman-s-eco-public.s3.us-west-2.amazonaws.com/DataTransfer/Galeano_Nino_et_al_visium_scans/CRC_OSCC_visium_tiff.tar.gz and https://doi.org/10.5281/zenodo.7419806
Please note for sample CRC_16
, the slide id is V10S15-020
and area code is D1
; for sample OSCC_2
, the slide id is V11A07-022
and area code is A1
.
We also uploaded fastq files to AWS for your convenience: https://fh-pi-bullman-s-eco-public.s3.us-west-2.amazonaws.com/DataTransfer/Galeano_Nino_et_al_visium_scans/CRC_OSCC_visium_fastq.tar.gz
All of the analysis code documented in this repository was run on the shared computing cluster
maintained at the Fred Hutchinson Cancer Research Center between May 2020 and August 2022.
The software dependencies used by these scripts are provided using the EasyBuild installation
maintained by the Fred Hutch Scientific Computing group.
Those software dependencies are loaded into the environment with the ml
command (e.g. ml CellRanger/6.1.1
).
Prior to running the analysis scripts, reference databases were downloaded for PathSeq (December 2020)
and CellRanger (January 2022).
The location of those reference databases is provided to the analysis scripts using the environment variables pathseqdb
and cellrangerdb
.
Visium_pipeline.sh
)Visium.R
)validate_and_count.py
) The folder used as outputs from the previous steps should be provided as an argument to the Visium_pipeline.sh
script.
CRC_16.visium.raw_matrix.genus.csv
and OSCC_2.visium.raw_matrix.genus.csv
contain bacteria UMI counting matrix that can be used as metadata in visium data processCRC_16.visium.raw_matrix.validate.csv
and OSCC_2.visium.raw_matrix.validate.csv
contain validation data that can be used as the input of validate_and_count.py
mkfastq
commandraw_data_folder
patient_samples_GEX_pipeline.sh
and cell_culture_samples_GEX_pipeline.sh
)patient_samples_16s_pipeline.sh
and cell_culture_16s_pipeline.sh
). The variable gex_bam_path
should be set to the output folder from the patient_samples_GEX_pipeline.sh
and cell_culture_samples_GEX_pipeline.sh
script.merge_metadata.py
and metadata_dedup.py
). The folder used as outputs from the previous steps should be provided as an argument to the merge_metadata.py
script.
headneck_gex_16s_mix_dedup.csv
HT_29_gex_16s_mix_dedup.csv
HCT_116_csv_gex_16s_mix_dedup.csv
contain bacteria UMI counting matrix that can be used as Seurat object metadata in single cell process.patient_samples_Seurat.r
and cell_culture_Seurat.r
)DE.r
)