gevaertlab / MethylMix

Identification of differentially methylated genes in biomedical data
13 stars 11 forks source link

How do you derive ProbeAnnotation in MethylMix packages? #7

Open Yunuuuu opened 3 years ago

Yunuuuu commented 3 years ago

Hi, I came across methylmix algorithm, I wanted to integrate multiple probe methylation values mapping to one gene into gene methylation level. methylmix clusters multiple probe methylation value with a probe annotation data, I have read methylmix citation articles, 1 and 2, but I didn't find where the annotation data came from, and I checked it by below code:

library(IlluminaHumanMethylation450kanno.ilmn12.hg19)
library(MethylMix)
library(tidyverse)

probe_with_one_more_gene <- Other$UCSC_RefGene_Name %>%
  purrr::map( ~unique(str_split(., pattern = ";")[[1]]) ) %>%
  purrr::map_lgl(
    ~length(.) >= 2
  ) %>%
  {rownames(Other)[.]}

anno_diff <- inner_join(
  ProbeAnnotation,
  as_tibble(Other, rownames = "ILMNID") %>%
    dplyr::select(ILMNID, UCSC_RefGene_Name, UCSC_RefGene_Group),
  by = "ILMNID"
) %>%
  dplyr::filter(
    ILMNID %in% probe_with_one_more_gene
  )

head(anno_diff)

here is the output:

      ILMNID   GENESYMBOL   UCSC_RefGene_Name UCSC_RefGene_Group
1 cg00050873        TSPY4      TSPY4;FAM197Y2       Body;TSS1500
2 cg00061679         DAZ1      DAZ1;DAZ4;DAZ4     Body;Body;Body
3 cg00311963 LOC100101121 LOC100101121;TTTY23    TSS1500;TSS1500
4 cg00335297       RBMY1F      RBMY1F;RBMY2FP    TSS1500;TSS1500
5 cg00576139 LOC100101115 LOC100101115;TTTY21          Body;Body
6 cg00903245        TSPY4      TSPY4;FAM197Y2          Body;Body

Could you explain how to deal with the multiple mapping to gene name of a single probe ID? It seems ProbeAnnotation just takes the first one?

danphillips28 commented 2 years ago

Thanks for asking this. I would also like to know. I am trying to integrate RNA-Seq and Methylation data and am finding this multi-probe mapping issue a big issue. I had previously just taken the first listed gene as an initial attempt, but this is not rigorous. Thanks, Dan