Authors: Tanya J Major and Riku Takei
This package allows the user to create regional Manhattan plots from p-values, log(p-values), or log(Bayes Factors) with points coloured according to LD and genes annotated beneath. The LD input can be generated from the users own data (e.g. for a non-reference population). The package comes with a number of reference files for gene annotation, but is not limited to the use of these files.
This package was created for use with human SNP data, but can be used to plot non-human data.
This script creates an R function to create LocusZoom-like plots. Three example input files are included for test purposes, along with an example .jpg output.
This script has one package dependency: scales
{Gencode,UCSC}_GRCh37_Genes_UniqueList{2017,2021}.txt
files can be used for this file.# load necessary files into R
Example.assoc.linear <- read.delim("Example.assoc.linear", stringsAsFactors = FALSE, header = TRUE)
Example.ld <- read.table("Example.ld", stringsAsFactors = FALSE, header = TRUE)
Unique.genes <- read.delim("Gencode_GRCh37_Genes_UniqueList2021.txt", stringsAsFactors = FALSE, header = TRUE)
# load the locuszoom function into R
source("functions/locus_zoom.R")
# create a LocusZoom-like plot
locus.zoom(data = Example.assoc.linear, # a data.frame (or a list of data.frames) with the columns CHR, BP, SNP, and P
region = c(16, 53340000, 54550000), # the chromosome region to be included in the plot
offset_bp = 0, # how many basepairs around the SNP / gene / region of interest to plot
ld.file = Example.ld, # a file with LD values relevant to the SNP specified above
genes.data = Unique.genes, # a file of all the genes in the region / genome
plot.title = "Association of FTO with BMI in Europeans", # the plot title
file.name = "Example.jpg", # the name of the file to save the plot to
secondary.snp = c("rs1121980", "rs8060235"), # a list of SNPs to label on the plot
secondary.label = TRUE) # TRUE/FALSE whether to add rsIDs of secondary SNPs to plot
One of snp
, gene
, or region
must be specified to create the plot:
snp
: specify the SNP to be annotated (you must also include ignore.lead = TRUE
if choosing this option)gene
: specify the Gene to make the plot aroundregion
: specify the chromsome region you want to plot (must be specified as c(chr, start, end)
As well as each of the following:
data
: specify the data.frame (or a list of data.frames) to be used in the plot (requires the columns "CHR", "BP", "SNP", and either "P" or "logBF")genes.data
: specify a data.frame with gene locations to plot beneath the graph (requires the columns "Gene", "Chrom", "Start", "End", and "Coding") - the Gencode or UCSC {Gencode,UCSC}_GRCh37_Genes_UniqueList{2017,2021}.txt
files in this repo can be used for thisplot.title
: specify a title to go above your plotfile.name
: specify a filename for your plot to be saved told.file
: specify a data.frame with LD values relevant to the SNP specified by snp
(requires the columns "SNP_B" and "R2") offset_bp
: specify how far either side of the snp
, gene
, or region
you want the plot to extend (defaults to 200000)psuedogenes
: when using one of the three gene lists in this repo you can specify whether you want to plot the pseudogenes (defaults to FALSE)RNAs
: when using one of the two gene lists created in 2021 in this repo you can specify whether you want to plot lncRNA and ncRNA genes (defaults to FALSE)
plot.type
: specify the file format of the plot (defaults to "jpg", options are "jpg", "svg", or "view_only" which will not save the plot, but output it to RStudio Viewer instead)nominal
: specify the nominal significance level to draw on the plot (in -log10(P), default is 6 or P = 1e-6)significant
: specify the significance level to draw on the plot (in -log10(P), default is 7.3 or P = 5e-8) secondary.snp
: provide the list of secondary SNP IDs (must match IDs in results file) to be highlighted on the plotsecondary.label
: specify whether to label the secondary SNPs on the plot (defaults to FALSE)secondary.circle
: specify whether to add a red circle around the secondary SNPs on the plot (defaults to TRUE)genes.pvalue
: specify a data.frame of p-values (e.g. MAGMA results) associated with each gene (requires the columns "Gene" and "P") colour.genes
: specify whether to colour genes based on a p-value provided in gene.pvalue (defaults to FALSE)population
: specify the 1000 genomes population to use when calculating LD if ld.file = NULL (defaults to "EUR", options are "AFR", "AMR", "EAS", "EUR", "SAS", "TAMA", and "ALL")sig.type
: specify whether the y-axis should be labelled as -log10(P) or log10(BF) (defaults to "P", options are "P", "logP", or "logBF"). For the "P" option an additional -log10 conversion of the input "P" column will be performed.nplots
: specify whether multiple results plots will be saved into your jpeg file (e.g. plot two GWAS results one above another; defaults to FALSE)ignore.lead
: specify whether to ignore the SNP with the smallest P and use the SNP specified by 'snp' to centre the plot (defaults to FALSE)rsid.check
: specify whether to check if the SNPs are labelled with rsIDs - should only matter if script is calculating LD for you (defaults to TRUE)nonhuman
: specify whether the data to plot has come from a non-human sample-set (defaults to FALSE) - if the data going in is from a non-human species make sure the chromosome column is only numbers (e.g. 1 instead of chr1, 23 instead of X). This is not reproducible from the example data.
locus.zoom(data = EUR_meta_full1_clean_rsid.nfiltered_chr7,
gene = "MLXIPL",
offset_bp = 500000,
genes.data = Gencode_GRCh37_Genes_UniqueList2021,
plot.title = "Association of MLXIPL with gout in Europeans",
file.name = "alternateExample.jpg",
genes.pvalue = MAGMA_EUR_meta_full_Gencode2021,
colour.genes = TRUE,
psuedogenes = FALSE,
RNAs = TRUE)