Probe-based spatial transcriptomics technologies offer great flexibility in terms of the transcripts that can be profiled. While technologies such as VisiumHD and VisiumFFPE enable the capture of most protein-coding genes, there are many applications that might require the design of custom probes. A few examples include:
Such probes often need to fulfill specific requirements (GC content, specific nucleotides in specific positions, avoid polymorphism and repeats) and need to be specific for their targets. gene2probe aims to help the user to design such probes. The default parameters are tailored towards the current recommendations by 10x Genomics for VisiumHD and VisiumFFPE, but in principle gene2probe can be used to design probes of any length and nucleotide requirement.
We always strongly recommend additionally manually BLASTing the selected probes before proceeding with ordering them.
We also note that gene2probe is currently tailored towards designing probes for the same species as those covered by the core probe set (and provided input files specifically cover the human genome). While the design of custom probes against bacterial or viral genomes is an exciting application, they pose many additional considerations that gene2probe doesn't currently take into account.
gene2probe builds on many awesome bioinformatic tools that need to be preinstalled. Luckily, they should all be easy to install via conda/pip.
We recommend setting up a dedicated conda environment.
conda create -n gene2probe_env python=3.11
Install bedtools and blast
conda activate gene2probe_env
conda install bioconda::blast
conda install bioconda::bedtools
Then install gene2probe via pip
pip install git+https://github.com/Teichlab/gene2probe.git
Depending on the assay you are planning to use, probe requirements will differ in terms of length, required GC content, whether the probes are split or not, and required nucleotides in specific positions.
Regardless of the specific requirements, you typically don't want to design probes agains repeats/low complexity regions or parts of the genome with common polymorphism in the species of interest. Additionally, you want to make sure that your probe specifically targets your gene of interest and will not lead to the detection of off-targets.
Our default parameters provided in the tutorials are tailored around the current recommendations by 10x Genomics for VisiumHD. These are the following:
Due to its modular nature, gene2probe allows the user to finetune these criteria. This gives flexibility in terms of using gene2probe to design probes for other assays, but also allows the user to become more strict or lenient depending on the number of probes that are available for a given gene.
For example, you can start by removing all probes overlapping a common polymorphism in any position - if that leads to too few probes, you can relax the requirement to +/- 5 nts from the ligation junction.
We rely on several publicly available resources, most of which can be directly obtained via the UCSC table browser.
The following resources are required:
SNPs and small indels in BED format. As long as you stick to hg38, you can use the ones provided here. Additional options can be downloaded from from the UCSC table browser (we used Variation/Common SNPs(151)/snp151Common but you might also want to consider other options).
In all cases, please make sure that your annotations match your genome assembly!
Additionally, you will need one or more blast databases to check for off-target effects. We provide instructions to construct your database in notebooks/001_make_blast_database.ipynb
.
Please refer to our demo notebooks for examples on how to design your own probes. We also provide examples on how to make your own blast databases.
Remember that we always strongly recommend additionally manually BLASTing the selected probes before proceeding with ordering them.
If you are using gene2probe, please consider citing our paper: Polański et al. Bin2cell reconstructs cells from high resolution visium HD data, Bioinformatics, 2024, btae546, https://doi.org/10.1093/bioinformatics/btae546.
We are thankful to 10x Genomics, Cecilia Kyanya, Sam Dougan and members of the Wellcome Sanger PAM informatics team for useful discussion. We also acknowledge being aided by ChatGPT4 when writing and documenting this pipeline.