al2na / methylKit

R package for DNA methylation analysis
https://bioconductor.org/packages/release/bioc/html/methylKit.html
210 stars 96 forks source link

Unite function doesn't work with destrand = TRUE #282

Closed PatrickMaclean closed 1 year ago

PatrickMaclean commented 1 year ago

Hello - thank you for all your work on this package.

I'm having trouble with the destranding option for unite(). Everything seems to work normally with destranding = FALSE across a range of library sizes and when varying parameters including min.per.group:

Destrand = FALSE - works as intended

methylKit::unite(methylList_filtered.nobuffy, destrand=FALSE, min.per.group = 5L)
uniting...
methylBase object with 3008400 rows

Destrand = TRUE - confusing error message

methylKit::unite(methylList_unfiltered.nobuffy, destrand=TRUE, min.per.group = 5L)
destranding...
uniting...
Error in methylKit::unite(methylList_unfiltered.nobuffy, destrand = TRUE,  : 
  no base were united. try adjusting 'min.per.group'.)

Regardless of the value of min.per.group I am still getting this error when destrand = TRUE. Visually inspecting the data I can see plenty of occurrences of succesfully called cytosines on opposing strands, so I'm not sure what to try next.

Thanks again

alexg9010 commented 1 year ago

Hi @PatrickMaclean,

Thanks for bringing this to our attention. Could you please create a reproducible example with a subset of your data and send me data and code? This will allow me to better understand the issue and provide a better diagnostics.

Right now, I cannot give you further advice apart from what the error message already stated.

Best, Alex

PatrickMaclean commented 1 year ago

Sure - I've attached here what should be a reproducuble example. There is a zip with the first 10k lines from my bismark .cov files, and an R script to combine them into a methylList object, then the expected and unexpected results of unite() with destrand=FALSE or TRUE.

methylkit_bugreport.zip

Apologies if I've missed anything obvious, this is my first time working with methylation data. For some background, the data is from a target-capture EM-seq experiment. There are some called CpGs directly adjacent to eachother (eg at hg38 chr1 10469 - 10472 there are two in a row) - I wonder if there could be a problem with destranding in that context?

alexg9010 commented 1 year ago

Thanks for providing code and data.

I quickly ran the script and the first thing that I noticed was that the methylRaw objects were unstranded (strand = *). For destranding we need the strand information to be able to shift and collapse forward and reverse strand, so destranding will not work here. I saw that you used bismark coverage reports with methRead, unfortunately this format does not contain strand information as we outlined in the details section of the methRead help. If you have access to other bismark output files like the Cytosine Report, I would suggest to try these. You may also extract methylation calls from the BAM files using methylKit::processBismarkAln or MethylDackel.

Anyways, it's great you provided the example data, so we can capture this bug.

PatrickMaclean commented 1 year ago

D'oh - sorry, that's pretty obvious - thanks for having a look so quickly. I'll rerun the alignment with the correct Bismark parameters.