TransBioInfoLab / coMethDMR

Detect Regions of Concurrent Differential Methylation
https://transbioinfolab.github.io/coMethDMR/
7 stars 2 forks source link

Probes kept when `rdrop == 0` #7

Closed gabrielodom closed 2 years ago

gabrielodom commented 2 years ago

From @XiaoweiHu-Stat: for some probes, we calculate the rdrop statistic to be exactly equal to 0, but the keep indicator is still 1. This is an error.

We do not have access to the original data (due to data privacy concerns), so we cannot directly replicate this error. We will first try to create some synthetic data and force the rdrop values to be 0.

gabrielodom commented 2 years ago

We found the culprit: when there are missing values in the methylation data, the cor() function in CreateRdrop() returns all NA. Then, these NA values infect the MarkComethylatedCpGs() function, meaning that we end up checking if probe IDs are elements of a vector of missing values:

dropCpGs_char <- CpGs_char[clusterRdrop_df$r_drop < rDropThresh_num]
keep = ifelse(CpGs_char %in% dropCpGs_char, 0, 1), ##(drop=0, keep=1)

I still don't know how the r_drop values are being changed from NA to 0, but this at least tells us why they aren't being removed.

gabrielodom commented 2 years ago

Possible solutions:

  1. Add checks for missing values
  2. Add an option to use the use = "pairwise.complete.obs" (or for the user to choose) option to cor() in CreateRdrop().
gabrielodom commented 2 years ago

@XiaoweiHu-Stat, can you try again now?

XiaoweiHu-Stat commented 2 years ago

Hi Gabriel,

Thanks for the update. I will try it this week and let you know if there are any more questions.

Xiaowei


From: Gabriel J. Odom @.> Sent: Sunday, October 10, 2021 1:20 PM To: TransBioInfoLab/coMethDMR @.> Cc: Hu, Xiaowei (xh6dx) @.>; Mention @.> Subject: Re: [TransBioInfoLab/coMethDMR] Probes kept when rdrop == 0 (#7)

@XiaoweiHu-Stathttps://github.com/XiaoweiHu-Stat, can you try again now?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/TransBioInfoLab/coMethDMR/issues/7#issuecomment-939519568, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOMRHZ3BB3PRTDTL3F4HQKTUGHDOVANCNFSM5FPLDEJQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

gabrielodom commented 2 years ago

@fveitz, can you test the following two things:

  1. Take the example data set in the package, and add a sample with all NA values. Try to replicate the error above (you may need to set use = "everything" in CreateRdrop() to replicate the error.
  2. Same as above, but have half of the samples be NA, and the other half actual values (beta values can be drawn Uniform distribution)
gabrielodom commented 2 years ago

We have added preliminary data checking to the workflow (as shown in vignette 2) via the MarkMissing() function.