epigen / RnBeads

Git working Repo synced to the Bioconductor: http://bioconductor.org/packages/devel/bioc/html/RnBeads.html
https://rnbeads.org/
7 stars 8 forks source link

Availability of MMBC functional annotation #34

Closed kimsjune closed 1 year ago

kimsjune commented 1 year ago

Hi,

I am hoping to incorporate MMBC functional annotation data into my analysis as demonstrated by Schonung et al. (https://www.biorxiv.org/content/10.1101/2022.06.02.493896v1.full.pdf). I've been searching around and the annotation bed file seems to be available here (https://github.com/stephenkraemer/bead_chip_array_annotations/blob/main/gene-annos_primary_one-row_v2.bed.gz). I've tried to import this bed file with rnb.set.annotation(), but the manual says that columns after the sixth one are dropped. The original code from the Schonung et al manuscript found here (https://github.com/MaxSchoenung/MMBC/blob/master/code/03_corr_twgbs_lsk_2021-09-14.R) seems to suggest there is custom script to allow the annotation bed file to be imported. I cannot follow how the imported bed file is loaded as the annotation file for RnBeads.

Would it be possible to host the functional annotation data (promoter, intron/exon, UTR, intergenic, etc.) on https://rnbeads.org/regions.html such that it can be imported using rnb.load.annotation.from.db()?

Ultimately, I would like to visualize a clustered heatmap of beta values, where probes are annotated with genomic features (promoter, UTR, etc.) instead of default probe types (I vs II as depicted in the attached png). Clearly RnBeads is already capable of doing this, only a matter of inputting the right annotation file.

Thank you for creating RnBeads and I'd appreciate any feedback!

SK heatmapLoc_1_correlation_average_1_1_1000_high_resolution

kimsjune commented 1 year ago

I came up with a solution.

1) import functional annotation bed file. The original bed file was trimmed to contain four columns, where the last column describes the functional annotation (promoter, exon, etc.)

tab.anno <- read.delim("./functional_annotation.bed",
                       header=T,
                       stringsAsFactors=F)
colnames(tab.anno) <- c("Chromosome","Start","End","name")

2) set options

rnb.set.annotation("mm_anno", tab.anno, 
                   description="mm_anno",assembly="mm10")
rnb.options(region.types=c(rnb.getOption("region.types"),"mm_anno"))

3) Then run rnb.run.import() and rnb.run.preprocessing()

4) differential analysis

cmp.cols <- "Sample_Type"
reg.types <- "mm_anno"
diffmeth <- rnb.execute.computeDiffMeth(rnb.set, cmp.cols, region.types=reg.types)
comparison <- get.comparisons(diffmeth)[1] 
tab.sites <- get.table(diffmeth,comparison, region.type="mm_anno",return.data.frame=T)

5) as covered in the manual section 6.2

aa <- annotation(rnb.set,type="mm_anno")
annotated.tab.sites <- data.frame(tab.sites,aa, row.names=NULL)
annotated.tab.sites.order <- annotated.tab.sites[order(annotated.tab.sites[,"combinedRank"]),]

6) for heatmap, get beta values for top differentially methylated region.types ("mm_anno"). This works because meth(), rnb.execute.computeDiffMeth() and annotation() all return coordinate sorted rows as long as region.types or type is set to "mm_anno"

topDiffBeta<- meth(rnb.set, type="mm_anno")[as.integer(rownames(annotated.tab.sites.order)),]

7) matching annotation names/descriptions

annotationList <- annotated.tab.sites.order$name